Welcome to Transcribe Bentham!

By Louise Seaward, on 6 December 2017

Jeremy Bentham

Jeremy Bentham

‘Many hands make light work. Many hands together make merry work‘, wrote the philosopher and reformer, Jeremy Bentham (1748 – 1832) in 1793.

In this spirit, we cordially welcome you to Transcribe Bentham, a double award-winning collaborative initiative which is crowdsourcing the transcription of Bentham’s previously unpublished manuscripts.

Anyone can start transcribing at our Transcription Desk.  Your transcripts will contribute to the production of Bentham’s Collected Works and preserve Bentham’s writings into the future.

Find out more about Transcribe Bentham in the sidebar menu on the left, or scroll down to read the latest news from the Transcribe Bentham blog.  Happy transcribing!

Project Update – Bentham vs the computer

By Louise Seaward, on 23 February 2018

Throughout it’s long history, the Bentham Project has always been interested in the way in which technological advances could be integrated into its work on the scholarly edition of Bentham’s Collected Works.  Transcribe Bentham is currently a proud partner in an international collaboration focused on using innovative computer science techniques to process historical manuscripts.  The mission of the READ (Recognition and Enrichment of Archival Documents) project is to make archival collections more accessible through the development and dissemination of Handwritten Text Recognition (HTR) technology.

This technology is freely available through the Transkribus platform.  Using algorithms of machine learning, it is possible to teach a computer to read a particular kind of handwriting – even Bentham’s!  The technology is trained by being shown images of documents and their accurate transcriptions.  Thanks to the hard work of the Transcribe Bentham volunteers, we are lucky to have a sizeable collection of transcripts that can be used as training data for automated text recognition.

We last summarised our experiments with HTR in a blog post from June 2017.   At that point, we had used technology from the Computational Intelligence Technology Lab (CITlab) at the University of Rostock to produce a model capable of processing the easier papers from the Bentham collection, largely those written by Bentham’s secretaries.  This model can automatically produce transcripts with a Character Error Rate of between 5 and 10%, meaning that 90-95% of characters in the transcript are correct.  The Bentham model is now publicly available in Transkribus under the title ‘English Writing M1’ and has been applied to other collections of eighteenth- and nineteenth-century English handwriting with some success.

Screenshot from Transkribus with automatically generated transcript. Box Add 3350, fo. 158, The British Library (Click to enlarge image)

Although this model copes well with documents where the handwriting and layout are relatively clear, it struggles to recognise the more difficult examples of writing from Bentham’s own hand.  So we decided to take on the challenge of teaching a computer to read some of the very worst examples of Bentham’s handwriting!

We used the Transkribus platform to create training data based on Boxes 30 and 31 of the Bentham Papers held in UCL Special Collections.  These manuscripts were part of Bentham’s lifelong obsession with critiquing the work of William Blackstone, the English jurist who was most famous for his Commentaries on the Laws of England (1765-9).  To create the training data, we uploaded around 200 digital images to Transkribus, segmented each image into lines and then copied over existing transcripts to match each image.

The resulting model generates transcripts with an average Character Error Rate of 26%.  This error rate is unfortunately too high to automatically produce transcripts suitable for scholarly editing.  Nevertheless, it does have the potential to facilitate the full-text search of the Bentham Papers.  Transkribus now includes sophisticated Keyword Spotting technology, which is capable of finding words and phrases in documents, even if they have been mistranscribed by the computer.

Screenshot from Transkribus with automatically generated transcript.  Box 31, fol. 78, UCL Bentham Papers, Special Collections, University College London (Click to enlarge image)

We are working with the Pattern Recognition and Human Language Technology (PRHLT) research centre at the Universitat Politècnica de València and the Digitisation and Digital Preservation group at the University of Innsbruck to present an open-access search functionality for the Bentham Papers.  We are also hoping that volunteers could get involved in this endeavour by checking and correcting the results of significant search queries to ensure their accuracy.

Improving the recognition of Bentham’s handwriting is our other aim and to this end, we will be producing more pages of training data in Transkribus.  The technology moves so fast that the efficiency of this process has already been streamlined thanks to technology from CITlab.  Transkribus can now easily find lines in images (even in documents with complex layouts) and it is also possible to use existing transcripts to automatically train a model, rather than copying them into Transkribus line by line.

If you are interested in following in our footsteps, you are welcome to give Transkribus a try!  You can find more information on the READ website and in the Transkribus How to Guides.

I would like to thank Chris Riley, PhD student and transcription assistant at the Bentham Project, for helping to produce the training data for the latest Bentham model.

Transcription Update – 6 January to 2 February 2018

By Louise Seaward, on 12 February 2018

Exciting is building at Transcribe Bentham HQ as we’re coming very close to the huge milestone of 20,000 pages transcribed by volunteers!  As ever, we extend a heartfelt thank you to our volunteers – your hard work makes all this possible.

Here are the full statistics for the initiative – as of 2 February 2018.

19,802 manuscript pages have now been transcribed or partially-transcribed. Of these transcripts, 18,976 (95%) have been checked and approved by TB staff.

Over the past four weeks, volunteers have worked on a total of 177 manuscript pages. This means that an average of 44 pages have been transcribed each week during the past month.

Check out the Benthamometer for more information on how much has been transcribed from each box of Bentham’s papers!

Project Update – Quality and Cost-Efficiency of Transcription

By Louise Seaward, on 1 February 2018

Our January project update is running a little late, but I hope I’ll be forgiven!

Amongst other things, I’ve been busy preparing the first edition of the Transcribe Bentham newsletter, a new monthly email update on the project.  You can expect to find out about the latest volunteer discoveries, our progress towards our transcription goals and other Bentham-related news and events.  Sign up here.

Coming back to the subject of today’s blog post, I wanted to celebrate the publication of a new article on Transcribe Bentham. ”Making such bargain: Transcribe Bentham and the quality and cost-effectiveness of crowdsourced transcription’ has been published in the journal, Digital Scholarship in the Humanities (open-access version is coming soon).  It was written by former Transcribe Bentham colleagues Dr Tim Causer, Dr Kris GrintDr Anna-Maria Sichani and Professor Melissa Terras.  The article is a detailed statistical evaluation of the quality and efficiency of Transcribe Bentham as a transcription project.  It reveals that we have a lot to be proud of!

Dr Tim Causer (Bentham Project, UCL), one of the authors of the new Transcribe Bentham article

We have long known that Transcribe Bentham is ground-breaking in terms of public engagement and access to historical material.  We are working with a fantastic community of volunteers and making thousands of pages of Bentham manuscripts and transcripts freely available online.  This article goes into more depth about other benefits of Transcribe Bentham, in terms of the quality and cost-efficiency of the work done by our volunteers.

The article is a pain-staking study of over 4000 transcripts submitted by Transcribe Bentham volunteers and then checked by Transcribe Bentham staff over a 20 month period.  The team analysed the quality of each submitted transcript and the time taken to review and edit it.  This analysis indicated that the work produced by our volunteers is of a very high quality (as we have always suspected!).  Most of the transcripts submitted on the site required only a handful of editorial changes, and nearly half required no changes at all.  It takes Transcribe Bentham staff an average of only 3.5 minutes to check and approve a page submitted by a volunteer.  So we have empirical evidence that the work of our volunteers is accurate enough to be used in a public database and as basis for further research.

Volunteer transcription is not only accurate, but efficient too.  It is much quicker for Transcribe Bentham staff to check transcripts, than to transcribe them from scratch.  There is also significant potential for cost-avoidance, even if we take into account the fact that the Bentham Project has received significant funding to establish and maintain the initiative.  Moreover, Transcribe Bentham contributes hugely to the ongoing work of the Bentham Project in producing the definitive scholarly edition of The Collected Works of Jeremy Bentham.  Bentham Project researchers can make use of transcripts produced by volunteers, so they now have a head-start when they are editing Bentham’s papers.  The article also estimates that if our transcribers continue to work at their current rate, Bentham’s papers could be completely transcribed by 2036.  This would be an astonishing achievement, especially considering that the Bentham Project has been working towards this goal since the late 1950s!

We are immensely grateful to our volunteers and do not wish to reduce them to a set of statistics.  Rather, this article is designed to provide evidence of the tangible benefits of crowdsourcing transcription, pointing out the signficant success of Transcribe Bentham and also offering a model for other projects who might like to follow our lead.  It really shows the huge contribution that volunteers are making to Bentham scholarship.

If you have any questions about the article, please contact Dr Tim Causer (Bentham Project, UCL).

If you are interested in finding out more about the history of Transcribe Bentham, you can read other articles at our Publications page.

Transcription Update – 9 December 2017 to 5 January 2018

By Louise Seaward, on 5 January 2018

Happy New Year from everyone at the Bentham Project!  Transcribe Bentham continues into another year and we are happy to have so many diligent volunteers working with us.  A big festive thank you to everyone who has been transcribing during the past month.

Here are the full statistics for the initiative – as of 5 January 2018.

19,625 manuscript pages have now been transcribed or partially-transcribed. Of these transcripts, 18,886 (96%) have been checked and approved by TB staff.

Over the past four weeks, volunteers have worked on a total of 153 manuscript pages. This means that an average of 38 pages have been transcribed each week during the past month.

Check out the Benthamometer for more information on how much has been transcribed from each box of Bentham’s papers!

Project update – box 26 completely transcribed!

By Louise Seaward, on 21 December 2017

Hello!  We’ve one last piece of news before we sign off for Christmas.  We are delighted to announce that Box 26 of the Bentham papers has now been completely transcribed.  This is a huge achievement as Box 26 contains more than 350 folios, many of which are yet to be published as part of Bentham’s Collected Works.  We need to give special congratulations to the transcriber Gill Hague (username: ohsoldgirl) who has transcribed the vast majority of this particular box.  We would also like to take this moment to thank all of the Transcribe Bentham volunteers for their continued contributions to other boxes on the Transcription Desk. 

Box 26 contains material that was written by Bentham between 1808 and 1822 concerning trial by jury, libel law, and the liberty of the press. In particular, Box 26 includes draft material for Elements of the Art of Packing, as applied to Special Juries, Particularly in Cases of Libel Law.  This work was first printed in 1810 but remained unpublished until 1821.  Bentham’s friend, the legal reformer Sir Samuel Romilly advised that the publication of the work could lead to the former’s prosecution.

Throughout Box 26, there are significant signs that Bentham was aware that he was being especially radical in his writings.  In folio one hundred and forty-three Bentham wrote:

‘My endeavour shall be to make myself understood as far as I dare. But what I am sure I can not forget, and what it concerns you all not to forget, [is] that in this Country, with its boasted Constitution, there is now no liberty.’

Bentham then mentioned by name Lord Sidmouth, whom at the time was Home Secretary and was known for his particularly heavy-handed approach to political dissent, as well as Lord Castlereagh, then Secretary of State for Foreign Affairs. However, Bentham also began to write a third name but the manuscript reads only ‘Lord |   |’, with a space left blank on the manuscript, presumably out of cautiousness.

Bentham Papers, UCL Special Collections, Box xxvi, fol. 143 [Image courtesy of UCL Special Collections].

Bentham argued that there should be greater clarity in written legislation and that the legal and political system should serve the interests of the people, rather than the ‘sinister interests’ of lawyers and politicians.  If an adequate definition of libel could not be achieved, Bentham asserted on folio one hundred and fifty-three, ‘no man can without imposture call himself a friend to the liberty of the press’.

We’ll close by hearing from the volunteer Gill Hague about her experience of transcribing Box 26.

‘I have been transcribing Bentham for some six years and usually look in the box index to identify topics which I think will be interesting to work on.   I thought Juries would be an interesting topic as one could relate it to current practice and so it proved.  Seeing that the box was untouched I thought I would start on page one and see if I could work my way through in sequence.   It gave the opportunity to see Bentham’s arguments unfold, and how the expression of his arguments were revised and refined.  As all transcribers will know Bentham often repeats words and phrases, so sometimes deciphering his handwriting – these documents date from the 1820’s when it was not at its clearest – was made easier by having come across words, particularly the legal terms,  used on other documents in the box.   Fortunately, there were also a number of pages of fair copy which were easier to transcribe.   Aside from the handwriting, the hardest pages were the double sized ones with text in four columns, I could type the line break code in my sleep once I’d worked my way through those and, at about 1200 words a page including the markup, they are roughly equivalent to three of a ‘standard’ one – if one can say there is such a thing.   All in all, it was an interesting and satisfying exercise and I hope to start on another box in the New Year.’

Thank you Gill for your efforts!

We would like to wish all our volunteers and readers a Merry Christmas!  The Transcription Desk will remain open over the holidays for those who wish to transcribe.  UCL is closed from 23 December to 2 January so the Transcription Desk will be largely unstaffed across that time.  So we look forward to seeing you all in 2018.


Thanks go to Chris Riley (PhD student at the Bentham Project) for the research on Box 26 that appears in this blog post.