Transcribing Bentham … with the help of a machine?

By Tim Causer, on 9 July 2015

Since the start of 2013 we – we being the Bentham Project, the UCL Centre for Digital Humanities, and the University of London Computer Centre (ULCC) – have participated in the exciting EC-funded tranScriptorium project with some fantastic colleagues from around Europe. The tranScriptorium consortium’s key aim is to develop and further handwritten text recognition technology which can index, search, and transcribe historic manuscript images.

When we were invited to take part in the project way back in 2012, the idea that a machine might be able to automatically transcribe a Bentham manuscript seemed, to us – knowing very little about HTR – a little fanciful. And yet here we are, with experiments carried out by our colleagues showing transcription word error rates of between 15% and 8% for some manuscripts, admittedly on some of the less complex Bentham papers, but remarkable nevertheless.

Our main task in tranScriptorium is to bring this technology to users, by developing and testing a crowdsourced transcription platform, known as TSX. This platform, put together by ULCC, incorporates the HTR technology and puts the user firmly in control of it, letting you take part in three interconnected ways depending upon your level of experience, preference, and/or amount of available free time.

In the first instance you can transcribe and encode a manuscript as you would do in Transcribe Bentham, though while taking advantage of useful features such as the segmentation of manuscript images into lines, and colour-coded TEI mark-up making it more straightforward to distinguish between it and the text of your transcript.

TSX: transcription and encoding

If you are new to transcription or don’t have much time to spare, you might like to request from the HTR engine a full transcript of a given manuscript. As it is unlikely that any HTR transcript will ever be entirely right, you can then correct it against the image.

TSX: HTR transcript correction

Finally, and perhaps most excitingly, is the facility to request from the HTR engine suggestions for certain words.There is nothing more frustrating when transcribing to come across a word (or several words!) which you can’t decipher, and then losing the flow and context of what follows. Being able to ask the HTR engine for possible suggestions could help fill in at least some of these gaps and reduce the level of frustration when trying to decipher just what Bentham wrote.


TSX: word suggestions

If you would like to try out transcribing Bentham manuscripts using TSX and its HTR technology, please do visit the website. (TSX
currently works best running in Chrome on Windows 7 and upwards. There are known issues with it running on MacOS – please do bear with us).

We must stress that this HTR technology could never replace volunteer transcribers or their superb work, but it may help to make their life a little easier. Future development of the platform will include improving the way in which word suggestions are delivered to users, addition of a user page to keep track of your contributions, and introducing a what-you-see-is-what-you-get transcription interface in which the TEI mark-up is hidden from view, leaving you to concentrate on transcription alone. A forthcoming follow-on project to tranScriptorium also promises many more exciting developments, but that’s for another day…

We are very excited about the potential for HTR in widening access to digitised manuscript material, and hope that it – and the Transkribus infrastructure and tool developed by our colleagues at the University of Innsbruck, on which TSX runs – will support scholars and institutions in establishing their own crowdsourcing initiatives.

We hope in the meantime that you enjoy trying out TSX. Please do let us know if you have any feedback on it (good or bad!)