The Bentham Project’s ‘Legacy Transcripts’

By Tim Causer, on 12 November 2010

Vital to the Bentham Project’s aim of producing a new and authoritative edition of The Collected Works of Jeremy Bentham are transcriptions of manuscripts written by the man himself. As noted in an earlier post, the volume of material produced by Bentham is daunting – UCL’s Special Collections holds around 60,000 folios across 176 boxes, while the British Library holds another 12,500 folios. Thanks to the indefatigable Bentham Project staff past and present, around 20,000 folios have been transcribed during the last 25 years.

Transcription has taken various forms at the Bentham Project. At first, it was done by hand and the transcripts typed up on a mechanical typewriter, until the Project’s staff found themselves on the technological cutting-edge upon the arrival of an electronic typewriter. For younger scholars, it is hard to imagine how problematic and time-consuming transcription can be without our trusty laptops.

More recently, this work has been carried out using a variety of word processors, such as Corel WordPerfect, and now Microsoft Word. However, unlike those produced through Transcribe Bentham, Word-produced transcriptions tend not to replicate all of the numerous deletions, insertions and marginalia which litter Bentham’s manuscripts and make them such complex documents. This is with good reason: the Bentham Project’s intention is to produce a Collected Works which are as close to Bentham’s intended vision as possible and which read coherently, to which purpose such markings are usually not conducive.

An example of a transcription from which the Bentham Project editors work.

While they suit the purposes of the Bentham Project editors admirably, these Word files do have drawbacks: they are produced using proprietary software and are thus are not as flexible as XML files, and they do not represent a faithful rendering of the original manuscript. The Word transcripts do not, for instance, tend to reproduce some of the most important and basic TEI elements used by Transcribe Bentham, such as line and page breaks, paragraphing and underlining, as they are unnecessary thanks to Word’s what-you-see-is-what-you-get interface.

As we intend to create a digital repository of Bentham’s works there is, therefore, a real need to take these Word documents – or ‘Legacy Transcripts’, as we’ve become wont to describe them – and encode them just as we would through Transcribe Bentham. This clearly necessitates a great deal of work, which was begun by our colleague Justin Tonra, and continues in earnest. It is complicated by there being 55 boxes worth of transcriptions which were saved using now defunct formats in large files documents; these transcriptions need to be salvaged using Word and separated into individual folios.

Once we have the Word files the real work can begin, though we were faced with an immediate dilemma about how to proceed. We could check and revise each transcription against the original folio and mark them up using all the Transcribe Bentham tags, but by doing so we might only complete a handful of boxes over the next few months. Instead, it was decided that we would aim to get as much material converted into XML as possible, thereby getting a larger amount of material into such a state that it will be ready to be made accessible to the public. This will be done by lightly encoding each transcript using mostly only header and paragraph tags.

A lightly-encoded Legacy Transcript, showing header, paragraph and underlining tags.

We will create valid TEI-compliant XML documents, an example of which can be seen below. This clearly does not look as sophisticated as manuscripts encoded through Transcribe Bentham, but such light encoding provides a working XML file, an extremely useful and valuable starting point, and something which can be edited in more detail at a later date.

The same lightly-encoded Legacy Transcript, rendered in Firefox.

A manuscript encoded through Transcribe Bentham, rendered in Firefox.

The conversion of our Legacy Transcripts complements the great labours of Transcribe Bentham‘s contributors, and it might even be possible to ask volunteers to add more thorough coding at some point in the future. The work will also contribute to the ultimate aim of producing a full-text searchable digital catalogue of the Bentham Papers, open to all, which will be housed in UCL’s Digital Collections.

Update: a small Legacy Transcript experiment is now running at the Transcription Desk

There is plenty to do, so wish us luck!