Reflection on Working with Voyant

Voyant is an open-source, web-based tool for analysis of unstructured text data. Voyant allows users to perform both text mining, or searching for words specified by the user, and topic modeling, in which the system identifies topics that words in the document (or, if a collection of documents, their corpus) cluster around.

To start, users must upload text to the Voyant platform. There are three ways to do this: type or paste text into the Voyant box, type or paste URLs, one URL per line, or upload files, which may be formatted as plain text, HTML, XML, RTF, Word, or PDF.

There are several ways that Voyant presents users to study word frequencies in a document or corpus. The most visually stimulating, and probably initial tool that users engage, is the Cirrus word cloud. The size of a word in a Cirrus cloud suggests its frequency. If a user clicks on a word in the cloud, its frequency is displayed. The Reader displays the whole text of documents or corpus. When users click on a word, the Reader also shows its frequency.  The Summary, meanwhile, indicates the length of documents, the uniqueness of a document relative to its corpus based on the proportion of its words that are distinctive, and the top distinctive words. The Trends tool displays a line graph to provide another visual representation of the frequency of the word/words in a document or corpus. The Trends tool may be manipulated to display word frequencies both within a single document and across documents in a corpus. Finally, the Contexts tool a word’s or a phrase’s surrounding words.  

If documents in a corpus are differentiated by author, time, or place of production, Voyant enables users to compare the meaning of words based on those differences.

To become introduced to Voyant, I worked with text transcriptions of interviews of former slaves, compiled by the Works Progress Administration during the 1930s, the “WPA Slave Narratives.” The texts were organized by states in which the interviews were conducted.

I then performed three exercises with the Voyant tool box. First, based on the initial word cloud that Voyant produced, I chose the words “slaves” and “free” to learn more about through their frequency in the corpus and their relative frequency by state. Using principally the Trends tool, I noticed variation in the words frequency by region, i.e., in upper south states and northern states (like Maryland and Ohio) the words “slaves” and “free” were more likely to appear than in lower south states (like Mississippi). Obviously this would require additional research to account for the difference, but one possibility, as suggested in the assigned reading (Sharon Musher, “The Other Slave Narratives: The Works Progress Administration Interviews,” 2014) is that both interviewers and interviewees in the lower South in the 1930s, where African Americans’ social and political status at the time was more tenuous than elsewhere, were more likely to use euphemisms for slavery and freedom.

Second, using the Summary tool, I chose several words that Voyant determined were distinctive within each state, compared to their usage across the whole corpus. These words were “marster” in Georgia, “mistis” in Alabama, “massy” in Florida, and “mistis” in Mississippi. The different dialectal terms for “master” in two states may suggest a regional variation. The commonality of the two states’ interviewees’ frequent usage of the term “mistis” begs further exploration. Additional meanings, and even sentiments, of these terms could be gleaned by reviewing their usage in context, using the Voyant Contexts tool.

Third,  I again focused on distinctive words in two states, “marster” in Georgia and “mastuh” in Missouri. I was particularly interested in the term “mastuh,” because, using the Trends tool, it appeared extraordinarily distinctive in Missouri compared to all other states in which the WPA slave narratives were produced. The distinction may simply emphasize the subjectivity of interviewers’ processes to record and transcribe the slave narratives. But even so, the novelty of the word among the Missouri interviews may reflect a more nuanced transcription of former slaves’ pronunciation of the word there. This is an exciting prospect, because it suggests that Voyant may allow users to pinpoint regional patterns in how slaves actually sounded when they spoke.

Voyant is an ideal tool for research by scholars and the general public who are interested in the frequency or uniqueness of occurrences of words and phrases in large text databases. It is particularly useful in light of the great increase in the availability of digitized databases in the humanities and social sciences in the last few decades.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post

Database ReviewDatabase Review

Market Research and American Business, 1935-1965 (MRAB) is a digital collection accessible by users affiliated with institutions that have purchased it. Its cost, according to a review, ranges from $14,000