By Tim Causer, on 1 June 2015
Below is a guest post from Milandeep Singh, a UCL Digital Humanities MSc student, who has been working with us during the past three weeks. It has been a pleasure to host Milandeep, and here she describes what she has been up to during her time with us.
Hello and welcome to a personal update on behind the scenes work at Transcribe Bentham. As a current MSc Digital Humanities placement student, I’ve had the honour to be supervised by Dr Tim Causer to work on this amazing project. I cannot believe that this is already the third week of placement! Having had the privilege to digitise Bentham manuscripts, process them, transcribe them, use the latest Handwritten Text Recognition and Transkribus software, and proof check transcriber’s submissions and encode these in XML, it has been a fairly steep learning.
A special thanks to all the Transcribe Bentham project volunteers for the steady progress made on the project. Thankfully, a lot more manuscripts have been transcribed and checked since I’ve had the pleasure to work on this wonderful initiative.
One of the most daunting tasks I encountered was transcribing Bentham’s manuscripts from Box 150 on the first day. I have no previous experience working with manuscripts, and learning that Bentham had written 60,000 manuscripts was a real surprise! It is unbelievable how a busy, incredibly talented individual like Bentham would have the time to write an estimated 30,000,000 words! Between us, reading Bentham’s handwriting, as well as his use of newly invented words, spelling errors and blotches of ink is indeed a major skill in itself.
However, Bentham’s ideas are fascinating and the phenomenal effort volunteer transcribers display on a daily basis is highly commendable, and evident in the quantity and quality of contributions every week.. This consistent, high rate of transcription is testament to the immense amount of work the volunteers have put in. The most heavily transcribed boxes of material during the three weeks of my time here were boxes 150, 538, and 541. In Bentham’s own words, your effort gives ‘the greatest happiness’ and undoubtedly encourages everyone else.
Prior to my placement on Transcribe Bentham, I had no idea just how dedicated were volunteer transcribers, nor how much goes on behind the scenes. I am grateful to UCL Creative Media Services for allowing me to visit, and providing the opportunity to visit their digitisation suite and single-handedly digitise one hundred untranscribed manuscripts from Box 64, and to electronically process them! Digitisation takes place in a dark room, with minimal light and humidity to ensure the manuscripts are looked after. I must say, this was an interesting experience, so do keep an eye out for when Box 64 manuscripts are uploaded to the Transcription Desk! The manuscripts I was working with each had at least one pin through them, and many were grouped together by an old paperclip, probably dating from when a cataloguer took on the daunting task of sorting Bentham’s manuscripts.
Being confronted with the task of removing each pin and paperclip was no easy challenge, and required being extremely careful to avoid doing any damage to the precious manuscripts. Many of these paperclips were bent and had caused some curling to the manuscripts, so I used the lifesaving tool – the ‘Munich Finger’. This non-reflective, transparent tool saved me from taking countless photographs of a single manuscript, to increase readability. Just by placing this tool on the edge of each manuscript, the appearance of each manuscript improved dramatically when photographed. Digitisation was, I found, an extremely satisfying task.
The most challenging part during the digitisation process for this particular set of manuscripts, was photographing the manuscripts with the least amount writing. To overcome this issue, one had to play around with the camera setting including the ISO settings, each time you were faced with such a manuscript. This required special attention. Processing of the images was much more straightforward, as one simply rotated the images, and labelled them using the batch processing feature of the Lightbox software. This process also involved cropping the images as well as removing unnecessary shadows, to maximise readability on each manuscript, using Photoshop. Once the images were processed, they were compressed to a suitable format to be uploaded to the Bentham image server and, ultimately, for display on Transcribe Bentham for volunteers to view.
Besides this, another interesting subject I encountered whilst working on the project is the Handwritten Text Recognition technology being developed for tranScriptorium. It is astonishing that this technology can present up to 90% accuracy for Bentham images (at least for neatly written manuscripts). I look forward to seeing how volunteers embrace this amazing technology in the future. As you may be aware, some public trials are being conducting using the TSX tool, and I did have the pleasure of using it. Transcribing with the assistance of this technology is fantastic! In order to enable the HTR technology to display accurate results for each manuscript line, the manuscripts are initially managed by a behind the scenes management tool known as – Transkribus. Using this tool, we draw text regions, text lines and baselines on each line of the manuscript. At first, this may sound incredibly easy; however this process is semi-automated, and correction of erroneous baselines can take a lot of concentration and effort to ensure there is not overlapping on other lines. Once each batch has been baselined, it is then abroad to the Spanish computer scientists who generate word clouds based on a particular set, to be used to generate accurate results in the HTR.
Nevertheless, my favourite task while on my placement was in checking the encoding of transcripts submitted by volunteer transcribers. It is truly rewarding to finalise the outstanding work of volunteers, lock the transcripts, save the encoded XML file, and send a message of thanks to the transcriber. This triumph would not have been possible without the sheer hard work, enthusiasm and dedication of Transcribe Bentham’s brilliant volunteers. Once again, my most sincere thanks go to all of them for their contributions to Transcribe Bentham’s ongoing success.
Milandeep Kaur Singh