Everyone’s secret information friend, Wikipedia, emphasizes the commercial aspects of crowdsourcing, but lately crowdsourced digitization has become a powerful tool for humanities organizations to obtain valuable digitizing work from the public at large. Crowdsourcing is arguably the logical end of the knowledge revolution wrought by digital humanities, by which the meaning or truth of an historical source is derived not from the testimony of an expert scholar, but from the quantity of well-meaning amateur editors.
Crowdsourcing actually may involve several activities, although all are a function of the shortcomings of computer-generated digitized text versions of original paper documents. It may involve adding descriptive tags or correcting errors in computer-generated digital texts, as is the task of newspaper transcription of the National Library of Australia’s Trove project. It may ask for verifying shapes (technically, “polygons”) and colors on old city maps, as offered by the New York Public Library’s Building Inspector (BI) platform. Or it may involve transcription of digitally reproduced copies of handwritten documents, as needed by the University College London’s (UCL) Transcribe Bentham project or the War Department Papers project of the Center for History & New Media at George Mason University. At some point, presumably, computers will be able to render computationally intelligible shapes of historical objects, and completely reliable digitized copies of historical sources, obviating the need for follow-up human correction or transcription.
The four projects noted above further suggest the different scopes that crowdsourcing projects might offer, which, to a large extent, will influence the kinds of contributors who find helping the project’s mission worth their involvement (again, a crucial evolution in digital humanities is the cultivation of these contributors, as distinct from the users, of a project’s knowledge creation). The Trove project seeks to tell all the stories of Australia, one at a time, principally through newspapers. But despite its hosting by the national library, Trove’s community, likely, is countless local or community historians and genealogists. This kind of affinity is also the case for Building Inspector: although New York is America’s metropolis, its micro-historical focus on the city’s forgotten architecture, coupled with its exercises in manipulation and interpretation of shapes and colors, probably mainly appeals to casual residents, high school students, and local history buffs. In contrast, Transcribe Bentham and the War Department Papers projects will appeal to more highbrow communities of contributors – scholars interested in intellectual, political, and diplomatic history. It may be a coincidence, but the former two projects seem most keen to remind contributors that their work is helpful to and helping build a larger (virtual) community. The latter two projects seem a bit more content that contributors will be self-motivated.
Crowdsourcing projects’ interfaces should ensure that whatever contributors do is appreciated, even if it is just visiting the site without contributing anything. And contributors should be allowed to remain anonymous, if possible. Even solicitation of an email address, today, conjures images of “promotions” crowding a contributor’s spam folder, or, worse, inbox. The Trove project seems to strike the right chord in requesting, but not requiring, contributors to provide more than minimal biographical data. It seems unlikely, generally, that a vandal or dissident will sabotage others’ work on any of these projects (this time, in contrast to the occasional editorial civil wars at Wikipedia). Professional or paid editors check public contributions to Transcribe Bentham and the War Department Papers, while Trove and Building Inspector rely, more democratically, on consensus community editing to find and correct errors as well as the occasional intentional (human) misrepresentation.
A key if perhaps easily forgotten, and perhaps, uncertain point about crowdsourcing’s value to the public good was made by Melissa Terras, Professor of Digital Cultural Heritage at the University of Edinburgh in a YouTube presentation on making and maintenance of the Transcribe Bentham project (she was formerly Director of UCL’s Centre for Digital Humanities). Professor Terras asks, “Transcribing should be a means to something else. Finding it can be enough to helping people find stuff, but in my heart of hearts for digital humanities is we want to be doing more than that. We don’t want to just be creating an index, we want to be then taking that data and doing something more. So what if you get the stuff transcribed? What can you do with it that you couldn’t do before?” This question still lurks behind or beyond the great work of crowdsourcing to involve lay people in the production, not merely consumption, of digital humanities. (How) does crowdsourcing anticipate creative or newly enabled usage of fully digitized Bentham papers, complete early U.S. state-making documents, centuries of Australian newspapers, or a virtual reality of at least some of New York’s boroughs (within the current BI scope, Manhattan, Brooklyn, and Queens)? Can a crowd produce a master narrative?