Digital Humanities Text Analysis

Text analysis calculates statistics for large bodies of text. Then, historians can draw conclusions from these statistics. For this assignment and post, I used Voyant Tools, a free web-based platform, to analyze text from two journals. Another popular word for this process is text-mining.

For my analysis, I used the Illinois Catholic Historical Review. Specifically, I chose Volume 1, No. 1 and Volume 1, No. 2. Both of these are from 1918. I plugged them both into Voyant Tools using the full text option. This landed me on Voyant Tools default analysis page. The top left corner, called Cirrus, produces wordclouds, which show popular words. The bigger the word, the more prevalent it is. I proceeded to eliminate many popular words, such as: father, rev, Illinois, Chicago, st, Catholic, church, la, louis, bishop, history, and historical. I eliminated these because they are common words that one would expect to find in a historical Catholic journal from Chicago. After this, I was left with the following (please feel free to manipulate this!):

This revealed some interesting facets of this journal. The most interesting revelation is the words Kaskaskia and Gibault. After this, I used the Context tool to look into Kaskaskia more. I discovered that it relates to a Native American village in Illinois that the Jesuits had a mission at (for more information, see here).

The other parts of Voyant Tools include a reader, which shows the frequency of words while reading the text. Additionally, there is a summary window which shows a broad overview of both documents. This revealed to me that Voyant Tools analyzed just over 133,000 words to make these tables! This is a staggering number of words, and would have taken me years to generate similar statistics.

After this, I wanted to see what kind of people this journal talked about. The easiest way to do this was to search for people’s titles and graph them using the Trends tool. From this, I generated this:

This shows that the most common person was priests (father or rev), followed by bishops (bishop), laymen (mr), and laywomen (mrs). This data shows a severe slant towards clergy.

This summarizes the five basic windows in Voyant Tools. I also messed around with three other tools. The first, and coolest, is called Dreamscape. This tool takes location names in the text and maps them out on a global map. While powerful, there are limits to the tool. For example, these journals contained histories related to Fr. Jacque Marquette, SJ. The Dreamscape tool took every reference to be Marquette, MI.

Dreamscape analysis of the two journals.

The last two tools I used were the Collocates and Textual Arc. Collocates reveals what words appear frequently together. For example, it showed that “historical” and “society” appear together 124 times.

Textual Arc was very confusing. It is defined as “a visualization of the terms in a document that includes a weighted centroid of terms and an arc that follows the terms in document order.” In other words, the text of the document represents the outside arc, and all the words inside are sized according to frequency. Then, a line then bounces around the words and “reads” the text. While visually pleasing, it is difficult to see the exact uses of this.

Voyant Tools and text-mining are powerful tools. However, these methods do come with drawbacks. The most significant is that I did not actually read these documents, all 133,000 words. So, while I have statistics on the journals, I do not actually know what they say. On the flipside, Voyant Tools allows me to quickly scan for specifics. Additionally, it reveals information, such as about Kaskaskia. While these tools are powerful, and allow for rapid statistics, it does not substitute for reading the text. They should be used to supplement a project, not be the focus of a project. This way, they can help us keep history alive.






3 responses to “Digital Humanities Text Analysis”

  1. Harrison Avatar

    I love how your use of textual analysis lead to you to find out about the existence of Native American villages that Catholic missionaries were active in. It really helps to solidify the fact that Catholic parties were in the Chicago area almost from the beginning of the European presence in the region. While Catholics were never a majority in the Chicago area, they’ve been an active force almost from the beginning and your analysis proves that. Great work as always Brian!

  2. Chris Cantwell Avatar
    Chris Cantwell

    This is a great post. I appreciate how you used the tools to learn something about the text. The next step that’s missing here is to then start interrogating the text. With Kaskaskia, for example. Are there certain tropes or words you can pull out how the journal discussed native americans? That could provide some insight.

  3. Dariel Chaidez Avatar
    Dariel Chaidez

    This is a great blog post! I love how you organized and used the tools to come to a deeper understanding of the source material! Great insight on Voyant and how the tools could be incorporated in this text analysis.

error: Content is protected !!