Tools for a Digital Humanities

I’ve recently discovered Project Bamboo, an initiative that describes itself on the project website as a multi-institutional, interdisciplinary, and inter-organizational effort that brings together researchers in arts and humanities, computer scientists, information scientists, librarians, and campus information technologists to tackle the question:

“How can we advance arts and humanities research through the development of shared technology services?”

Come again? At first, the concept of shared technology services may seem a little vague. But a closer look at the full project proposal makes it fairly clear what is meant.

While academics use digital technology and the Net for a wide variety of things (research, teaching, publishing, communication), all of these uses have a degree of improvisation to them. Very few of the tools we use are developed specifically for the context of science and research, and sometimes this limitation shows.

For example, I’ve started to use del.icio.us to tag all books I read in Google Books (see what I’ve recently tagged). Del.icio.us is an all-purpose bookmark management application, yet the ability to collaboratively create bibliographies with colleagues in the same subfield makes it a useful tool for researchers. Del.icio.us is not the only example - Google Documents can be used to collaboratively work on a publication and SlideShare is great for making your presentations available directly and linking them to your CV (see my own), instead of just offering them for download. But for other, more specialized tasks there is still a severe lack of tools.

A few months ago, a colleague of mine needed a corpus (a collection of texts for linguistic analysis) for her research. Corpora exist in a wide variety of shapes and sizes, but the specific issue she was working on made it necessary for her to create an entirely new corpus (built from blog texts) instead of working with material from more traditional sources (newspapers, fiction etc). In addition, she also had only a basic working knowledge of corpora and the ways in which they can be used.

We approached the problem from two different angles. I helped her build a specialized corpus by using a piece of software that I had developed for my own work on blogs. To analyze the data, I pointed her to two interesting functions of Many Eyes, a web-based application for visualizing statistical information: tag clouds and word trees.

Tag clouds (or, in this case, word clouds) make it possible to visualize how often a word occurs in a piece of writing. Simply paste a text into the appropriate form field on the site and Many Eyes will do the rest (have a look at this cloud for Shakespeare’s complete works for a nice example).

Word trees visualize textual data in another way, allowing the reader in essence to navigate from one word to the next.

There are of course specialized tools for corpus analysis that do a whole lot more than this in terms of statistics and Many Eyes lacks a whole range of feature that a genuine linguistic research tool would need (say, differentiating between different word classes). Yet Many Eyes has several advantages that the more specialized tools lack. It is

  • web-based
  • freely accessible
  • easy to use
    and
  • versatile

In a sense, the points above make all the difference. Desktop-based software is under all sorts of constraints: you have to acquire it, install it and figure out how to get data from and to it, keep it up to date and do all sorts of other “chores” that have little to with your main objective. And then you can’t even share your data and collaborate as easily as you can on the Web. In other words, you’re using a program, not a service.

Of course Project Bamboo is not just about developing new tools (well, at least not in my mind). The assumption has long been that as soon as someone puts a useful service on the web, a user community will magically appear. This may be true of web video, blogging, wikis and many other services with a broad appeal, all of which can and should be used much more in academia. But with more specialized services, adoption is something that should be actively supported. In others words: we need to do more than just develop tools. We should work to popularize general-purpose services like del.icio.us and document ways in which they can be appropriated for research and teaching - and (most importantly) how they can be connected to one another. At the same time, just putting developers and researchers into a room together can produce impressive results.

A great example for both a mashup of services and a new way of looking at data is the Web version of the World Atlas of Language Structures (WALS). It’s a combination of Google Maps with the print version of the atlas, which shows the distribution of linguistic features across the world’s languages (say, which languages have definite articles). Not only is WALS Online more convenient to use than both the print version and the CD-ROM that comes with it (not to forget it is also free), but it makes entirely new uses possible. Think about collaborative annotation or linking research articles directly to WALS. Imagine an paper that lives on the Web and shows a map section from WALS in a side window, with the text flowing around it.

Developing services like WALS and getting them out there has the potential to completely transform academia in the long run, making it much collaborative and transparent than it is today. It will be exciting to see what role Project Bamboo plays in that context.

Edit: I forgot to include a link to the project outline, plus a workshop transcript and some background information.

The Harvard Open Access Policy - could it kill peer-reviewed journals?

The question smells of hyperbole, but it’s an idea that’s rather persistant for me. But let’s start at the beginning.

If you’re active in the Open Access community, you’ve probably read about nothing else in the last week: the Harvard Open Access resolution. In a nutshell, everything that’s published by members of the faculty will be made available on the Net for free, unless the author asks for an exemption. While some scholars might do this, it means that the bulk of what is published by researchers at Harvard will be Open Access from now on.

From Everybody’s Libraries:

This is the first university-level open access mandate in the US, from the most prominent university in the US, and as many have noted, this is a huge step forward for open access to research. There are two aspects to the mandate: the familiar aspect directs faculty to supply Harvard copies of their papers to post; the more novel aspect stipulates that Harvard automatically get the rights to post their faculty papers for free. Harvard allows faculty members to exempt papers from these requirements, but it must be done in writing, with reason, separately for each paper that a faculty member wants to exempt.

I find this approach ingenious. As people maintaining institutional repositories have come to know, there are two main barriers to distributing one’s faculty’s work in one’s repository: getting hold of the work, and getting the right to publish the work. The first of these can be handled in various ways; whether the faculty, the departmental administrators, or the librarians get the content to the right place, it’s all purely a matter of local negotiation. But that’s not the case with rights. By the time we repository maintainers get content from authors, the authors have often signed their rights away to the journals that published the papers. The publishers have effectively called dibs on redistribution rights, and we can’t distribute unless they agree to it. A faculty member that may want to have us distribute her work too may no longer have the power to let us– she’s already signed that right away to someone else.

In a sense, the question of how Open Access can be facilitated has always been discussed by the wrong people. No level of activism could ever solve the key problem: that the majority of researchers do not truly care about how their work is distributed - and why should they? Harvard’s decision has the potential to make what seemed a complicated situation rather simple:

  • to get a job at a prestigious university, a scholar must sign an agreement to publish OA
  • when the scholar has an article ready for publication, he forwards it to the librarian who manages the institution’s repository (or to an admin who takes care of that)
  • anything that ends up in the repository is globally available via Google Scholar and similar services
  • keyword searches combined with a knowledge of the disciplinary landscape (i.e. I know that X, Y and Z have published things relevant to my research before - what about their other work?) are how researchers find relevant sources

What does this mean for traditional peer-review and the future of scientific journals?

I think that, quite plausibly, this could be the beginning of the end for both of these institutions.

Think about it. Right now, the idea of quality control via commentary and evaluation of a piece of research is married with making it available. An article is only published after having been reviewed, because that is how the print process works. But once digital availability is guaranteed regardless of quality, this no longer makes any sense: evaluation and discussion of a paper and it’s availability are two separate issues. Journal publishers will no longer have to fuss around with technical issues if publication, storage and archiving are handled through their institution’s repository. Those functions will be entirely where should have been in the first place: with the libraries. Repositories will replace journals as the ‘place’ where articles are stored - the exciting question is what will replace them as the place where they are discussed and evaluated. It’s hard not to see the immense potential for open peer review and moderated discussions. And once papers truly live on the Net (i.e. are hypertext and freely accessible) it is only logical that they will be linked and crossreferenced in the same way that blogs are.

I know that there are skeptics who believe that this will have a negative impact on the quality of published research. But that mistakes the Internet for a browsable medium, for a resource that you can ever look at in its entirety. It no longer makes any sense that only what has been deemed worthy should be principally available. What is truly significant scientifically will be recognized by peers and separated from what is of lesser relevance - as it has been the case. But no longer will availability and quality be two ends of the same equation.

iScience (Part 1): Me me me

This is the first part in a series of posts in which I’ll think aloud about the future of academic research and the role that social software could (should?) play in it. My central idea is that research should become more transparently collaborative and that publicly funded projects and initiatives should focus on enabling individual researchers instead of institutions. Too often, what is described by the term e-science* is the development of unwieldy and byzantine systems that seek to anticipate and solve a huge array of problems, many of which have already been solved elsewhere. Because we tend to conceptualize software as tools - objects that can be used to perform certain tasks - we tend to believe that more functions equate to a better product. This view is problematic because it ignores the situation of data in a networked environment, where the user is free to use a variety of different web-based services in combination and can thus effectively create his own system. I want to begin by looking at how we can use social software as an information management tool.

* I’m not talking about scientific grid computing here (the original meaning of e-science), but about more general tools for areas such as academic publishing, information and knowledge management, teaching which are also often described as e-science applications.

*****

A while ago, I looked at the slides for a presentation on something pretty and colorful that either started with “e” or ended with “2.0″. I’m afraid I’ve forgotten what exactly it was about, as all those fancy products and services eventually become a blur in my oversaturated cortex. But it wasn’t really the presentation as such that I found interesting. Going through the slides, I came across this memorable quote from a Japanese student that caught my attention:

When you lose your cell phone, you lose a part of your brain.

The quote got me thinking. What service or device equates to part of my brain for me (apart from my cell phone)?

The answer? iGoogle.

I’ve been using the service for only a few months now, but in conjunction with a handful of other products (many of which are integrated into my iGoogle page via widgets) it has become the single place where my email, appointments and bookmarks live together. Beyond that, I also use it to store ideas that spontaneously pop into my head. I keep a virtual scratchpad for notes. I have a to-do list with prioritized items. I have access to my calendar, email, feeds, bookmarks and documents when I log in, no matter where I am. Other services such as My Yahoo! do the same thing. They allow you to build a personal information ecology that’s always at your fingertips.

Screenshot of my iGoogle page

Right, so what’s so special about this?

First, there’s the fact that iGoogle allows you to tie different informational strands together in a personalized environment. We have enough neat applications and more than enough sources of information. The problem is that they all live in different places and that they usually don’t talk to each other. A lot of people have already pointed this out, but it’s something that can’t really be said often enough: we have to stop thinking that we need better, bigger tools with more functions when what we really need is better integration of existing “little” tools into personalized informational mosaics.

The second advantage is that your personal informational bundle is accessible everywhere you go, as long and there’s a computer with an Internet connection available.

Thirdly (and this tends to be overlooked), you can’t ever really lose a piece of information that you create or maintain online. I lose paper notes all the time and a hard drive can die unexpectedly. Sure, you can counter the former problem by being better organized than I am and the latter one by keeping backups, but information on the Web is virtually indestructible.

Fourth, you can share everything. I’ve been using Google Documents for quite a while without sharing any of my files, but recently we were brainstorming for a collaborative project and the document sharing feature turned out to be very useful. And sharing bookmarks on del.icio.us has vast potential for groups of collaborators.

The catch is that what’s presented in iGoogle is not just information, it’s my information. I can arrange it around myself in a pattern that makes sense to me in the same way that I arrange furniture in my office. It’s a pattern that can change over time and that only has to appeal to me - it’s optimized for my personal informational needs. This kind of individualized coherence makes certain things possible. Think about it like this: when all your colleagues have their offices in the same hallway as you do, you can easily drop in for a chat or to discuss an idea that just popped into your head. Now think about how most research tools work. Are they part of a pattern, a pattern that can be rearranged by the user? Generally the answer is no.

We tend to associate the whole Web 2.0 shebang with tuned-in, social-media-creating adolescent hipsters who supposedly do nothing all day long but to “share and remix” content, but when you think about it “share and remix” is what researchers have been doing for hundreds of years, albeit with different tools. The free dissemination of human expression is what characterizes social media, we are told. Wait, isn’t that what science is all about? Of course science is hardly just about expressing oneself. Among other things we have peer review, academic titles and scholarly societies to assure that what is published under the label “research” is not just opinion. And you can argue that disseminating an article on solar physics via arXiv.org is not the same thing as uploading a video of the mentos and coke experiment to YouTube. But thinking critically that’s a difference in scope and culture, as in how we value the article vs. how we value the clip, what you can do with different forms of content and who can pass an authoritative judgment on uses and forms.

The practices of academic research have arguably never been more with the times than today. Collaboration, openness and sharing information are core values of academic communities. But many argue that while the scientific ethos may be more en vogue than ever (think about the origins of Free Software in academia) we are still lacking the right tools for science 2.0.

Is that really true? I want to take a little time and look at what networked research tools we have and why, by and large, we are not using them.

The second part of this essay will present and discuss a number of tools for web-based research and collaboration.

I am a hard bloggin' scientist - read the Manifesto Subscribe to the CorpBlawg Feed

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 License.