<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>CorpBlawg &#187; Many Eyes</title>
	<atom:link href="http://corpblawg.ynada.com/category/many-eyes/feed" rel="self" type="application/rss+xml" />
	<link>http://corpblawg.ynada.com</link>
	<description>Cornelius Puschmann on computer-mediated discourse, linguistics, open access and other things that interest him. Now discontinued - see blog.ynada.com</description>
	<lastBuildDate>Mon, 14 Mar 2011 15:48:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Why don&#8217;t we put the Google N-gram corpus on the Web?</title>
		<link>http://corpblawg.ynada.com/2008/07/08/why-dont-we-put-the-google-n-gram-corpus-on-the-web</link>
		<comments>http://corpblawg.ynada.com/2008/07/08/why-dont-we-put-the-google-n-gram-corpus-on-the-web#comments</comments>
		<pubDate>Tue, 08 Jul 2008 15:56:50 +0000</pubDate>
		<dc:creator>Cornelius</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Linguistics]]></category>
		<category><![CDATA[Many Eyes]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://corpblawg.ynada.com/2008/07/08/why-dont-we-put-the-google-n-gram-corpus-on-the-web</guid>
		<description><![CDATA[Two years ago, the news that Google was going to make available the largest collection of n-grams to the global research community that had ever been compiled sparked a lot of interest. I was among those who immediately ordered those six DVDs&#8230; and ever since they have been resting dutifully on a shelf in my [...]]]></description>
			<content:encoded><![CDATA[<p>Two years ago, the <a href="http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html">news that Google was going to make available the largest collection of n-grams</a> to the global research community that had ever been compiled sparked a lot of interest. I was among those who immediately ordered those <a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13">six DVDs</a>&#8230; and ever since they have been resting dutifully on a shelf in my office, collecting dust and reminding me that I need to bring them into a more accessible format. Alas, so many things to do, so little time.</p>
<p>Something led me to look for information on that corpus this morning and I came across <a href="http://infosthetics.com/archives/2008/05/google_web_trigram_visualization.html">this</a>. Sadly, the link to <a href="http://www.chrisharrison.net/">Chris Harrison</a>&#8216;s site no longer seems to work, but when I saw his visualization I immediately thought of <a href="http://services.alphaworks.ibm.com/manyeyes/home">Many Eyes</a>.</p>
<p>My reasoning goes a little something like this:</p>
<p>Google N-gram corpus hosted on <a href="http://pimm.wordpress.com/2007/09/25/googles-palimpsest-project-promiscuous-distribution-of-all-science-data-sets/">Google Palimpsest servers</a> + IBM&#8217;s Many Eyes = Fantastic web-based tool for linguists</p>
<p>To elaborate: Google has a gigantic database of word collocations that can be used as a baseline for all sorts of interesting analysis, but you can&#8217;t really do any of these things unless you have a user interface and enough computing juice to sift through almost 100 gigabytes of text data on the fly. On the other hand, solutions like Many Eyes are amazing, but currently there&#8217;s no way you can use it with a really big data set like the n-gram corpus and therefore the research utility is limited.</p>
<p>But it must be possible somehow to bring together</p>
<ul>
<li>the data to analyze</li>
<li>the computing power required and</li>
<li>the user interface needed to allow a non-technical person to interact with the data</li>
</ul>
<p>and to put the whole thing on the Web. It&#8217;s Google&#8217;s stated intention to host data for us and they are the owner of the n-gram dataset, so I can&#8217;t imagine there being any licensing issues. And, as if to put a cherry on that sundae, here&#8217;s <a href="http://googleresearch.blogspot.com/2008/04/research-in-cloud-providing-cutting.html">the announcement of a joint project</a> by IBM, Google and the NSF to do exactly that kind of stuff. Put the 6 DVDs on a cloud, throw in a tweaked version of Many Eyes (think <a href="http://services.alphaworks.ibm.com/manyeyes/page/Word_Tree.html">the word tree vis</a> with a few extras) and <a href="http://en.wikipedia.org/wiki/Construction_grammar">construction grammarians</a> everywhere will absolutely love it.</p>
<p>What do you think?</p>
]]></content:encoded>
			<wfw:commentRss>http://corpblawg.ynada.com/2008/07/08/why-dont-we-put-the-google-n-gram-corpus-on-the-web/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Tools for a Digital Humanities</title>
		<link>http://corpblawg.ynada.com/2008/05/16/tools-for-a-digital-humanities</link>
		<comments>http://corpblawg.ynada.com/2008/05/16/tools-for-a-digital-humanities#comments</comments>
		<pubDate>Fri, 16 May 2008 12:24:52 +0000</pubDate>
		<dc:creator>Cornelius</dc:creator>
				<category><![CDATA[Digital Humanities]]></category>
		<category><![CDATA[iScience]]></category>
		<category><![CDATA[Many Eyes]]></category>
		<category><![CDATA[Project Bamboo]]></category>
		<category><![CDATA[Web 2.0]]></category>

		<guid isPermaLink="false">http://corpblawg.ynada.com/2008/05/16/tools-for-a-digital-humanities</guid>
		<description><![CDATA[I&#8217;ve recently discovered Project Bamboo, an initiative that describes itself on the project website as a multi-institutional, interdisciplinary, and inter-organizational effort that brings together researchers in arts and humanities, computer scientists, information scientists, librarians, and campus information technologists to tackle the question: &#8220;How can we advance arts and humanities research through the development of shared [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve recently discovered <a href="http://projectbamboo.uchicago.edu/">Project Bamboo</a>, an initiative that describes itself on the project website as <em>a multi-institutional, interdisciplinary, and inter-organizational effort that brings together researchers in arts and humanities, computer scientists, information scientists, librarians, and campus information technologists to tackle the question:</em></p>
<p><em>&#8220;How can we advance arts and humanities research through the development of shared technology services?&#8221;</em></p>
<p>Come again? At first, the concept of <em>shared technology services</em> may seem a little vague. But a closer look at the full project proposal makes it fairly clear what is meant.</p>
<p>While academics use digital technology and the Net for a wide variety of things (research, teaching, publishing, communication), all of these uses have a degree of improvisation to them. Very few of the tools we use are developed specifically for the context of science and research, and sometimes this limitation shows.</p>
<p>For example, I&#8217;ve started to use <a href="http://del.icio.us/">del.icio.us</a> to tag all books I read in <a href="http://books.google.com/">Google Books</a> (see <a href="http://del.icio.us/cornelius/books">what I&#8217;ve recently tagged</a>). Del.icio.us is an all-purpose bookmark management application, yet the ability to collaboratively create bibliographies with colleagues in the same subfield makes it a useful tool for researchers. Del.icio.us is not the only example &#8211; <a href="http://docs.google.com/">Google Documents</a> can be used to collaboratively work on a publication and <a href="http://www.slideshare.net/">SlideShare</a> is great for making your presentations available directly and linking them to your CV (see <a href="http://cornelius.ynada.com/CV.html#presentations">my own</a>), instead of just offering them for download. But for other, more specialized tasks there is still a severe lack of tools.</p>
<p>A few months ago, a colleague of mine needed a corpus (a collection of texts for linguistic analysis) for her research. Corpora exist in a wide variety of shapes and sizes, but the specific issue she was working on made it necessary for her to create an entirely new corpus (built from blog texts) instead of working with material from more traditional sources (newspapers, fiction etc). In addition, she also had only a basic working knowledge of corpora and the ways in which they can be used.</p>
<p>We approached the problem from two different angles. I helped her build a specialized corpus by using a piece of software that I had developed for my own work on blogs. To analyze the data, I pointed her to two interesting functions of <a href="http://services.alphaworks.ibm.com/manyeyes/app">Many Eyes</a>, a web-based application for visualizing statistical information: <a href="http://services.alphaworks.ibm.com/manyeyes/page/Tag_Cloud.html">tag clouds</a> and <a href="http://services.alphaworks.ibm.com/manyeyes/page/Word_Tree.html">word trees</a>.</p>
<p><img src="http://services.alphaworks.ibm.com/manyeyes/images2/tag1.gif" align="left" height="308" width="542" /> Tag clouds  (or, in this case, word clouds) make it possible to visualize how often a word occurs in a piece of writing. Simply paste a text into the appropriate form field on the site and Many Eyes will do the rest (have a look at <a href="http://services.alphaworks.ibm.com/manyeyes/view/S95RjIsOtha6D6kFt~mkI2~">this cloud for Shakespeare&#8217;s complete works</a> for a nice example).</p>
<p>Word trees visualize textual data in another way, allowing the reader in essence to navigate from one word to the next.</p>
<p>There are of course specialized tools for corpus analysis that do a whole lot more than this in terms of statistics and Many Eyes lacks a whole range of feature that a genuine linguistic research tool would need (say, differentiating between different word classes). Yet Many Eyes has several advantages that the more specialized tools lack. It is</p>
<ul>
<li>web-based</li>
<li>freely accessible</li>
<li>easy to use<br />
and</li>
<li>versatile</li>
</ul>
<p>In a sense, the points above make all the difference. Desktop-based software is under all sorts of constraints: you have to acquire it, install it and figure out how to get data from and to it, keep it up to date and do all sorts of other &#8220;chores&#8221; that have little to with your main objective. And then you can&#8217;t even share your data and collaborate as easily as you can on the Web. In other words, you&#8217;re using a program, not a service.</p>
<p>Of course Project Bamboo is not just about developing new tools (well, at least not in my mind). The assumption has long been that as soon as someone puts a useful service on the web, a user community will magically appear. This may be true of web video, blogging, wikis and many other services with a broad appeal, all of which can and should be used much more in academia. But with more specialized services, adoption is something that should be actively supported. In others words: we need to do more than just develop tools. We should work to popularize general-purpose services like del.icio.us and document ways in which they can be appropriated for research and teaching &#8211; and (most importantly) how they can be connected to one another. At the same time, just putting developers and researchers into a room together can produce impressive results.</p>
<p>A great example for both a mashup of services and a new way of looking at data is the Web version of the <a href="http://wals.info/">World Atlas of Language Structures (WALS)</a>. It&#8217;s a combination of Google Maps with the print version of the atlas, which shows the distribution of linguistic features across the world&#8217;s languages (say, which languages have <a href="http://wals.info/feature/37">definite articles</a>). Not only is WALS Online more convenient to use than both the print version and the CD-ROM that comes with it (not to forget it is also free), but it makes entirely new uses possible. Think about collaborative annotation or linking research articles directly to WALS. Imagine an paper that lives on the Web and shows a map section from WALS in a side window, with the text flowing around it.</p>
<p>Developing services like WALS and getting them out there has the potential to completely transform academia in the long run, making it much collaborative and transparent than it is today. It will be exciting to see what role Project Bamboo plays in that context.</p>
<p><strong>Edit:</strong> I forgot to include a link to the <a href="http://projectbamboo.uchicago.edu/files/docs/bamboo_proposal.pdf">project outline</a>, plus <a href="http://cavlec.yarinareth.net/archives/2008/05/15/second-session-project-bamboo/">a workshop transcript</a> and some <a href="http://ancientworldbloggers.blogspot.com/2008/05/bamboo-and-reactions.html">background information</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://corpblawg.ynada.com/2008/05/16/tools-for-a-digital-humanities/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What corporate blogs look like: JNJ, Chrysler, Palm, Marriott</title>
		<link>http://corpblawg.ynada.com/2007/11/01/what-corporate-blogs-look-like-jnj-chrysler-palm-marriott</link>
		<comments>http://corpblawg.ynada.com/2007/11/01/what-corporate-blogs-look-like-jnj-chrysler-palm-marriott#comments</comments>
		<pubDate>Thu, 01 Nov 2007 00:34:52 +0000</pubDate>
		<dc:creator>Cornelius</dc:creator>
				<category><![CDATA[Chrysler]]></category>
		<category><![CDATA[Corporate Blogging]]></category>
		<category><![CDATA[Johnson & Johnson]]></category>
		<category><![CDATA[Many Eyes]]></category>
		<category><![CDATA[Marriott]]></category>
		<category><![CDATA[Palm Inc]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://corpblawg.ynada.com/2007/11/01/what-corporate-blogs-look-like-jnj-chrysler-palm-marriott</guid>
		<description><![CDATA[If blogs were people, this would be a little bit like a beauty pageant. I&#8217;ve taken four blogs from my corpus of company blogs and analyzed them using IBM&#8217;s Many Eyes. Many Eyes is a hosted software tool for quick and simple data visualization &#8211; you should try it out if you ever have something [...]]]></description>
			<content:encoded><![CDATA[<p>If blogs were people, this would be a little bit like a beauty pageant. I&#8217;ve taken four blogs from my <a href="http://en.wikipedia.org/wiki/Text_corpus">corpus</a> of company blogs and analyzed them using IBM&#8217;s <a href="http://services.alphaworks.ibm.com/manyeyes/app">Many Eyes</a>. Many Eyes is a hosted software tool for quick and simple data visualization &#8211; you should try it out if you ever have something statistical to present.</p>
<p>Here are the four (randomly picked) candidates.</p>
<p>1. <a href="http://www.jnjbtw.com/">JNJ BTW</a></p>
<p class="field">Posts: 52</p>
<p class="field">Words: 17077</p>
<p class="field">Sentences: 729</p>
<p class="field">Average Word Length (AWL): 4.8</p>
<p class="field">Average Sentence Length (ASL): 23.4</p>
<p class="field">Average Words per Post (AWpP): 328.4</p>
<p>Word Cloud:</p>
<p><a href="http://services.alphaworks.ibm.com/manyeyes/view/SMhVnJsOtha6v7_MAu3yJ2-" style="margin: 0pt; padding: 0pt"><br />
<img src="http://services.alphaworks.ibm.com/manyeyes/static-resources/snapshot/89ade5ae15ce1b580115f85e8c60127b.jpeg" id="blogThisImgSmall" style="border-style: solid solid none; border-color: rgb(175, 117, 93) rgb(175, 117, 93) -moz-use-text-color; border-width: 1px 1px 0pt; margin: 0pt; padding: 0pt" /></a></p>
<p class="clear">&nbsp;</p>
<p>Word Tree:</p>
<p><a href="http://services.alphaworks.ibm.com/manyeyes/view/SMhVnJsOtha648VD_y3yJ2-" style="margin: 0pt; padding: 0pt"><br />
<img src="http://services.alphaworks.ibm.com/manyeyes/static-resources/snapshot/89ade5ae15ce1b580115f85f813e1286.jpeg" id="blogThisImgSmall" style="border-style: solid solid none; border-color: rgb(175, 117, 93) rgb(175, 117, 93) -moz-use-text-color; border-width: 1px 1px 0pt; margin: 0pt; padding: 0pt" /></a></p>
<p class="clear">&nbsp;</p>
<p>2. <a href="http://blog.chryslerllc.com/">Chrysler Blog</a></p>
<p class="field">Posts: 59</p>
<p class="field">Words: 13341</p>
<p class="field">Sentences: 780</p>
<p class="field">Average Word Length (AWL): 4.6</p>
<p class="field">Average Sentence Length (ASL): 17.1</p>
<p class="field">Average Words per Post (AWpP): 226.1</p>
<p>Word Cloud:</p>
<p><a href="http://services.alphaworks.ibm.com/manyeyes/view/SMhVnJsOtha6H8l2TB4yJ2-" style="margin: 0pt; padding: 0pt"><br />
<img src="http://services.alphaworks.ibm.com/manyeyes/static-resources/snapshot/89ade5ae15ce1b580115f8635f131293.jpeg" id="blogThisImgSmall" style="border-style: solid solid none; border-color: rgb(175, 117, 93) rgb(175, 117, 93) -moz-use-text-color; border-width: 1px 1px 0pt; margin: 0pt; padding: 0pt" /></a></p>
<p class="clear">&nbsp;</p>
<p>Word Tree:</p>
<p><a href="http://services.alphaworks.ibm.com/manyeyes/view/SMhVnJsOtha6S8le6I4yJ2-" style="margin: 0pt; padding: 0pt"><br />
<img src="http://services.alphaworks.ibm.com/manyeyes/static-resources/snapshot/89ade5ae15ce1b580115f86508ab129e.jpeg" id="blogThisImgSmall" style="border-style: solid solid none; border-color: rgb(175, 117, 93) rgb(175, 117, 93) -moz-use-text-color; border-width: 1px 1px 0pt; margin: 0pt; padding: 0pt" /></a></p>
<p class="clear">&nbsp;</p>
<p>3. <a href="http://blog.palm.com/">The Official Palm Blog</a></p>
<p class="field">Posts: 46</p>
<p>Words: 9262</p>
<p class="field">Sentences: 446</p>
<p class="field">Average Word Length (AWL): 4.5</p>
<p class="field">Average Sentence Length (ASL): 20.8</p>
<p class="field">Average Words per Post (AWpP): 201.3</p>
<p>Word Cloud:</p>
<p><a href="http://services.alphaworks.ibm.com/manyeyes/view/SMhVnJsOtha6f8VsuW4yJ2-" style="margin: 0pt; padding: 0pt"><br />
<img src="http://services.alphaworks.ibm.com/manyeyes/static-resources/snapshot/89ade5ae15ce1b580115f868bae212ab.jpeg" id="blogThisImgSmall" style="border-style: solid solid none; border-color: rgb(175, 117, 93) rgb(175, 117, 93) -moz-use-text-color; border-width: 1px 1px 0pt; margin: 0pt; padding: 0pt" /></a></p>
<p class="clear">&nbsp;</p>
<p>Word Tree:</p>
<p><a href="http://services.alphaworks.ibm.com/manyeyes/view/SMhVnJsOtha6q8VBye4yJ2-" style="margin: 0pt; padding: 0pt"><br />
<img src="http://services.alphaworks.ibm.com/manyeyes/static-resources/snapshot/89ade5ae15ce1b580115f86abe3612b6.jpeg" id="blogThisImgSmall" style="border-style: solid solid none; border-color: rgb(175, 117, 93) rgb(175, 117, 93) -moz-use-text-color; border-width: 1px 1px 0pt; margin: 0pt; padding: 0pt" /></a></p>
<p class="clear">&nbsp;</p>
<p>4. <a href="http://www.blogs.marriott.com/">Marriott on the Move</a></p>
<p class="field">Posts: 60</p>
<p>Words: 4937</p>
<p class="field">Sentences: 305</p>
<p class="field">Average Word Length (AWL): 4.5</p>
<p class="field">Average Sentence Length (ASL): 16.2</p>
<p class="field">Average Words per Post (AWpP): 82.3</p>
<p>Word Cloud:</p>
<p><a href="http://services.alphaworks.ibm.com/manyeyes/view/SMhVnJsOtha639_O1v4yJ2-" style="margin: 0pt; padding: 0pt"><br />
<img src="http://services.alphaworks.ibm.com/manyeyes/static-resources/snapshot/89ade5ae15ce1b580115f86ec36812c5.jpeg" id="blogThisImgSmall" style="border-style: solid solid none; border-color: rgb(175, 117, 93) rgb(175, 117, 93) -moz-use-text-color; border-width: 1px 1px 0pt; margin: 0pt; padding: 0pt" /></a></p>
<p class="clear">&nbsp;</p>
<p>Word Tree:</p>
<p><a href="http://services.alphaworks.ibm.com/manyeyes/view/SMhVnJsOtha6E9_Piy4yJ2-" style="margin: 0pt; padding: 0pt"><br />
<img src="http://services.alphaworks.ibm.com/manyeyes/static-resources/snapshot/89ade5ae15ce1b580115f86fae6c12d0.jpeg" id="blogThisImgSmall" style="border-style: solid solid none; border-color: rgb(175, 117, 93) rgb(175, 117, 93) -moz-use-text-color; border-width: 1px 1px 0pt; margin: 0pt; padding: 0pt" /></a></p>
<p class="clear">&nbsp;</p>
<p>All four candidates have around 50 entries, with word counts ranging from roughly 5,000 (Marriot on the Move) to about 17,000 (JNJ BTW). I&#8217;ve picked different starting terms for the word trees, depending on the the respective company&#8217;s industry, but you can easily search inside a tree for any word that occurs in the blog.</p>
]]></content:encoded>
			<wfw:commentRss>http://corpblawg.ynada.com/2007/11/01/what-corporate-blogs-look-like-jnj-chrysler-palm-marriott/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

