workshop | UCL UCLDH Blog

#DH2010: Introduction to Text Analysis using Voyeur workshop

By Claire S Ross, on 8 July 2010

Yesterday I took part in a really interesting workshop looking at text analysis tools as part of the Digital Humanities 2010 conference. Now I have to admit, I have used Voyeur before, I say used, I mean I tried to use, didn’t know what I was doing, pressed a couple of buttons and then gave up. These actions suggested that it was pretty essential that I attended the workshop. It was indeed a great help. Voyeur is a web based textual analysis tool, and it provides you with a number of different panels of information regarding whatever text you put into it; from a summary of the corpus or document you input to distribution graphs. In the workshop we were taken through the different panels and the capabilities of each one.

Stéfan Sinclair and Geoffrey Rockwell took us through how to use Voyeur with a single text, with a corpus and then showed us some of the advanced features.

Firstly we were let loose on a version of Mary Shelley’s Frankenstein. I decided to look for the distribution of the words, Human, Despair, Happy, Monster. Because I think thats a pretty nice statement. And here are the results:

Sinclair, S. and G. Rockwell (2010). Word Trends. Voyeur. Retrieved July 7, 2010 from http://voyeurtools.org/tool/TypeFrequenciesChart/

During the session, I uploaded a copy of the DH2010 conference abstracts and attempted to do some analysis on them (you can see the Cirrus wordle created with Voyeur here). The results were pretty interesting . I decided to look for the distribution of the words Museums, Libraries and Archives, because that’s what I’m mostly interested in, again the results are really interesting.

Out of the corpus with a total of 226,593 words and 20,772 unique words, museums are mentioned 21 times, museum 51 times, library 166, libraries 83, archives 99, and archive 81.

Type	Count	Z-Score	Difference	Relative	Std. Dev.	Peakedness	Skew
archives	99	0.6	0	4.37	6.624	-1.88	0.03
libraries	83	0.49	0	3.66	4.877	0.45	0.79
museums	21	0.07	0	0.93	2.961	2.58	1.64
archive	81	0.48	0	3.57	6.118	5.79	2.33
library	166	1.06	0	7.33	8.329	-0.06	0.52
museum	51	0.27	0	2.25	6.887	3.97	2.02

Then if you compare the term museum(s) with the term text, again the results are quite interesting.

Type	Count	Z-Score	Difference	Relative	Std. Dev.	Peakedness	Skew
text	701	4.72	0	30.94	18.046	-0.51	-0.54
texts	379	2.52	0	16.73	11.12	0.1	0.57
museums	21	0.07	0	0.93	2.961	2.58	1.64
museum	51	0.27	0	2.25	6.887	3.97	2.02

What does this mean for museum research and discussion in the digital humanities discipline? Is it sidelined? Are textual studies in DH prevalent for a reason? Or is it just semantics?

Filed under Conferences

Tags: #dh2010, conference abstracts, textual analysis, voyeur, workshop

No Comments »

#DH2010: Introduction to Text Analysis using Voyeur workshop

Recent Blog Posts