X Close

UCLDH Blog

Home

Menu

#DH2010: Introduction to Text Analysis using Voyeur workshop

By Claire S Ross, on 8 July 2010

Yesterday I took part in a really interesting workshop looking at text analysis tools as part of the Digital Humanities 2010 conference.  Now I have to admit, I have used Voyeur before, I say used, I mean I tried to use, didn’t know what I was doing, pressed a couple of buttons and then gave up.  These actions suggested that it was pretty essential that I attended the workshop.  It was indeed a great help. Voyeur is a web based textual analysis tool,  and it  provides you with a number of different panels of information regarding whatever text you put into it; from a summary of the corpus or document you input to distribution graphs.  In the workshop  we were taken through the different panels and the capabilities of each one.

Stéfan Sinclair and Geoffrey Rockwell took us through how to use Voyeur with a single text, with a  corpus and then showed us some of the advanced features.

Firstly we were let loose on a version of Mary Shelley’s Frankenstein.  I decided to look for the distribution of the words, Human, Despair, Happy, Monster.  Because I think thats a pretty nice statement.   And here are the results:

Sinclair, S. and G. Rockwell (2010). Word Trends. Voyeur. Retrieved July 7, 2010 from http://voyeurtools.org/tool/TypeFrequenciesChart/

During the session, I uploaded a copy of the DH2010 conference abstracts and attempted to do some analysis on them (you can see the Cirrus wordle created with Voyeur here).  The results were pretty interesting . I decided to look for the distribution of the words Museums, Libraries and Archives, because that’s what I’m mostly interested in, again the results are really interesting.

Out of the corpus with a total of 226,593 words and 20,772 unique words, museums are mentioned 21 times, museum 51 times, library 166, libraries 83, archives 99, and archive 81.

Type Count Z-Score Difference Relative Std. Dev. Peakedness Skew
archives 99 0.6 0 4.37 6.624 -1.88 0.03
libraries 83 0.49 0 3.66 4.877 0.45 0.79
museums 21 0.07 0 0.93 2.961 2.58 1.64
archive 81 0.48 0 3.57 6.118 5.79 2.33
library 166 1.06 0 7.33 8.329 -0.06 0.52
museum 51 0.27 0 2.25 6.887 3.97 2.02

Then if you compare the term museum(s) with the term text, again the results are quite interesting.

Type Count Z-Score Difference Relative Std. Dev. Peakedness Skew
text 701 4.72 0 30.94 18.046 -0.51 -0.54
texts 379 2.52 0 16.73 11.12 0.1 0.57
museums 21 0.07 0 0.93 2.961 2.58 1.64
museum 51 0.27 0 2.25 6.887 3.97 2.02

What does this mean for museum research and discussion in the digital humanities discipline? Is it sidelined? Are textual studies in DH prevalent for a reason? Or is it just semantics?