Text analysis calculates statistics for large bodies of text. Then, historians can draw conclusions from these statistics. For this assignment and post, I used Voyant Tools, a free web-based platform, to analyze text from two journals. Another popular word for this process is text-mining.
For my analysis, I used the Illinois Catholic Historical Review. Specifically, I chose Volume 1, No. 1 and Volume 1, No. 2. Both of these are from 1918. I plugged them both into Voyant Tools using the full text option. This landed me on Voyant Tools default analysis page. The top left corner, called Cirrus, produces wordclouds, which show popular words. The bigger the word, the more prevalent it is. I proceeded to eliminate many popular words, such as: father, rev, Illinois, Chicago, st, Catholic, church, la, louis, bishop, history, and historical. I eliminated these because they are common words that one would expect to find in a historical Catholic journal from Chicago. After this, I was left with the following (please feel free to manipulate this!):
This revealed some interesting facets of this journal. The most interesting revelation is the words Kaskaskia and Gibault. After this, I used the Context tool to look into Kaskaskia more. I discovered that it relates to a Native American village in Illinois that the Jesuits had a mission at (for more information, see here).
The other parts of Voyant Tools include a reader, which shows the frequency of words while reading the text. Additionally, there is a summary window which shows a broad overview of both documents. This revealed to me that Voyant Tools analyzed just over 133,000 words to make these tables! This is a staggering number of words, and would have taken me years to generate similar statistics.
After this, I wanted to see what kind of people this journal talked about. The easiest way to do this was to search for people’s titles and graph them using the Trends tool. From this, I generated this:
This shows that the most common person was priests (father or rev), followed by bishops (bishop), laymen (mr), and laywomen (mrs). This data shows a severe slant towards clergy.
This summarizes the five basic windows in Voyant Tools. I also messed around with three other tools. The first, and coolest, is called Dreamscape. This tool takes location names in the text and maps them out on a global map. While powerful, there are limits to the tool. For example, these journals contained histories related to Fr. Jacque Marquette, SJ. The Dreamscape tool took every reference to be Marquette, MI.
The last two tools I used were the Collocates and Textual Arc. Collocates reveals what words appear frequently together. For example, it showed that “historical” and “society” appear together 124 times.
Textual Arc was very confusing. It is defined as “a visualization of the terms in a document that includes a weighted centroid of terms and an arc that follows the terms in document order.” In other words, the text of the document represents the outside arc, and all the words inside are sized according to frequency. Then, a line then bounces around the words and “reads” the text. While visually pleasing, it is difficult to see the exact uses of this.
Voyant Tools and text-mining are powerful tools. However, these methods do come with drawbacks. The most significant is that I did not actually read these documents, all 133,000 words. So, while I have statistics on the journals, I do not actually know what they say. On the flipside, Voyant Tools allows me to quickly scan for specifics. Additionally, it reveals information, such as about Kaskaskia. While these tools are powerful, and allow for rapid statistics, it does not substitute for reading the text. They should be used to supplement a project, not be the focus of a project. This way, they can help us keep history alive.