X Close

UCL Centre for Digital Humanities



Born Digital: The 21st Century Archive in Practice and Theory – DH2010

SarahDavenport18 July 2010

I have recently undertaken the task of student assistant at the Digital Humanities 2010 conference at times being entrusted with helping to video some of the panel and parallel sessions including one of particular interest to my studies Born Digital: The 21st Century Archive in Practice and Theory. Thoughts turned to masses of born digital material created from early word processing up to blogs and tweeting that will become the archives of the future, universities have only recently begun to document not only the digital content but also the hardware used.

I will focus here on a paper delivered by Erika Farr Finding aids and files directories: researching a 21st Century Archive. Delivering a case study of Salman Rushdie’s hybrid archive held by Emory Libraries within the manuscript, archives and rare books library had been presented with a new preservation challenge in the form of three laptop computers, one Mac, one external hard drive and paper materials including doodles, writings all from the popular author Salman Rushdie. In 2006 this was the first time that whole computers had been acquired by the library. A collaborative effort between archivists, digital librarians and software engineers was required, with an emphasis on a responsibility to the researcher when given the opportunity to harvest such unique data, and images. There was a need to respect the hybrid nature of the collection, when dealing with personal, financial and material of a living author. This presented a unique policy development stage and triage where it was agreed for there to be no web access to the collection, and understandably refusal of access to any materials that had been deleted.

The product of this the Salman Rushdie Archives is a fascinating project encompassing 15 years of Salman Rushdie’s life, whereby born digital content can be placed in an emulated environment, ‘as is’, creating a researcher experience demonstrating how he used the computers, folders, labelled the folders and his application choice. Researchers can begin to understand the human interaction with the computer and how this affected the creative process, it is not simply a task to capture the content of the document but also the setting in which it was created. Early user testing is said to have found this influenced the use of keywords and search techniques. It is hoped that further user testing can begin as the collection is used in the future, for now the researcher can begin to create a critical analysis of the emulated environment as introduced here by Erika Farr The Emulated Environment: Exploring Rushdies Desktop whereby ‘efforts to capture the holders images and efforts to create tools to leverage that data’ will be of inspiration to research in the future.

I am grateful to have been selected for a student assistant bursary at DH2010 as this was one of many sessions that have inspired my dissertation looking into the implementation of metadata standards in sound archives.  As part completion of my dissertation I am currently researching the resource discovery mechanisms of the Archival Sound Recordings website. This will involve user testing, and I am looking for volunteers to take part. This will take no longer than 45 minutes involving an observation of how you use the website and a short questionnaire. If you would be interested in helping me with this study please contact stephen.anderson.09@ucl.ac.uk.


Cosmo Anderson is a guest author on this blog and a student on the MA Library and Information Studies programme here at UCL.

UCLDH (and DH) in the news

AnneWelsh15 July 2010

Times Higher Education today carries a report on UCLDH Deputy Director Dr Melissa Terras’s closing plenary speech at DH2010 on Saturday.

Reporter Sarah Cunnane focuses on Mel’s call for members of the DH community “to sharpen up on the web or lose out” (THE).

This was only one aspect of the 40-minute long plenary, which used the Transcribe Bentham project as a vehicle to highlight achievements and challenges faced by Digital Humanists in the last decade as well as in the future.

The first time the Digital Humanities conference invited a member of its own community to give the closing keynote, the speech received a tremendous response from those present at the time and from those who read Mel’s notes on her blog and / or watched the video on arts-humanities.net You can read the tweets at #dh2010

The issues raised by Mel in her plenary will also be the topic under discussion at the next Decoding Digital Humanities meeting.

Image: Simon Mahony, originally posted to arts-humanities.net

#DH2010: Introduction to Text Analysis using Voyeur workshop

Claire SRoss8 July 2010

Yesterday I took part in a really interesting workshop looking at text analysis tools as part of the Digital Humanities 2010 conference.  Now I have to admit, I have used Voyeur before, I say used, I mean I tried to use, didn’t know what I was doing, pressed a couple of buttons and then gave up.  These actions suggested that it was pretty essential that I attended the workshop.  It was indeed a great help. Voyeur is a web based textual analysis tool,  and it  provides you with a number of different panels of information regarding whatever text you put into it; from a summary of the corpus or document you input to distribution graphs.  In the workshop  we were taken through the different panels and the capabilities of each one.

Stéfan Sinclair and Geoffrey Rockwell took us through how to use Voyeur with a single text, with a  corpus and then showed us some of the advanced features.

Firstly we were let loose on a version of Mary Shelley’s Frankenstein.  I decided to look for the distribution of the words, Human, Despair, Happy, Monster.  Because I think thats a pretty nice statement.   And here are the results:

Sinclair, S. and G. Rockwell (2010). Word Trends. Voyeur. Retrieved July 7, 2010 from http://voyeurtools.org/tool/TypeFrequenciesChart/

During the session, I uploaded a copy of the DH2010 conference abstracts and attempted to do some analysis on them (you can see the Cirrus wordle created with Voyeur here).  The results were pretty interesting . I decided to look for the distribution of the words Museums, Libraries and Archives, because that’s what I’m mostly interested in, again the results are really interesting.

Out of the corpus with a total of 226,593 words and 20,772 unique words, museums are mentioned 21 times, museum 51 times, library 166, libraries 83, archives 99, and archive 81.

Type Count Z-Score Difference Relative Std. Dev. Peakedness Skew
archives 99 0.6 0 4.37 6.624 -1.88 0.03
libraries 83 0.49 0 3.66 4.877 0.45 0.79
museums 21 0.07 0 0.93 2.961 2.58 1.64
archive 81 0.48 0 3.57 6.118 5.79 2.33
library 166 1.06 0 7.33 8.329 -0.06 0.52
museum 51 0.27 0 2.25 6.887 3.97 2.02

Then if you compare the term museum(s) with the term text, again the results are quite interesting.

Type Count Z-Score Difference Relative Std. Dev. Peakedness Skew
text 701 4.72 0 30.94 18.046 -0.51 -0.54
texts 379 2.52 0 16.73 11.12 0.1 0.57
museums 21 0.07 0 0.93 2.961 2.58 1.64
museum 51 0.27 0 2.25 6.887 3.97 2.02

What does this mean for museum research and discussion in the digital humanities discipline? Is it sidelined? Are textual studies in DH prevalent for a reason? Or is it just semantics?

Cirrus: A Cirrus type wordle of the #DH2010 conference abstracts

Claire SRoss7 July 2010

Look what I made in the Introduction to Text analysis using Voyeur workshop: A Cirrus type wordle of the #DH2010 conference abstracts.

Sinclair, Stéfan and Geoffrey Rockwell. “Cirrus.” Voyeur. 7 Jul. 2010 <http://voyeur.hermeneuti.ca/tool/Cirrus/../../tool/Cirrus/>

Sinclair, S. and G. Rockwell (2010). Cirrus. Voyeur. Retrieved July 7, 2010 from http://voyeur.hermeneuti.ca/tool/Cirrus/../../tool/Cirrus/