A A A

Archive for the 'Transcription' Category

Over 20,000 pages transcribed by volunteers!

By Louise Seaward, on 13 April 2018

Big news at Transcribe Bentham HQ!  Volunteers have now transcribed over 20,000 pages of Bentham’s writings at our online Transcription Desk.

This is a monumental achievement that all our volunteers, past and present, should be incredibly proud of.  It comes only around two years after we hit the 15,000 pages milestone in January 2016.  Through their transcription work, our volunteers have a created a formidable online resource used by scholars inside and outside the Bentham Project, as well as interested members of the public around the world.  Thank you all!

 

 

 

 

 

 

 

 

Here are the full statistics for the initiative – as of 30 March 2018 (date of our last update).

20,120 manuscript pages have now been transcribed or partially-transcribed. Of these transcripts, 19,136 (95%) have been checked and approved by TB staff.

Over the past four weeks, volunteers have worked on a total of 163 manuscript pages. This means that an average of 41 pages have been transcribed each week during the past month.

Check out the Benthamometer for more information on how much has been transcribed from each box of Bentham’s papers!

Transcription Update – 3 February to 2 March 2018

By Louise Seaward, on 9 March 2018

We’re back with another insight into the recent activity of our volunteer transcribers.  A heart-felt thanks goes out to everyone who has taken the time to transcribe something for us!

Here are the full statistics for the initiative – as of 2 March 2018.

19,957 manuscript pages have now been transcribed or partially-transcribed. Of these transcripts, 19,090 (95%) have been checked and approved by TB staff.

Over the past four weeks, volunteers have worked on a total of 155 manuscript pages. This means that an average of 39 pages have been transcribed each week during the past month.

Check out the Benthamometer for more information on how much has been transcribed from each box of Bentham’s papers!

Project Update – Bentham vs the computer

By Louise Seaward, on 23 February 2018

Throughout it’s long history, the Bentham Project has always been interested in the way in which technological advances could be integrated into its work on the scholarly edition of Bentham’s Collected Works.  Transcribe Bentham is currently a proud partner in an international collaboration focused on using innovative computer science techniques to process historical manuscripts.  The mission of the READ (Recognition and Enrichment of Archival Documents) project is to make archival collections more accessible through the development and dissemination of Handwritten Text Recognition (HTR) technology.

This technology is freely available through the Transkribus platform.  Using algorithms of machine learning, it is possible to teach a computer to read a particular kind of handwriting – even Bentham’s!  The technology is trained by being shown images of documents and their accurate transcriptions.  Thanks to the hard work of the Transcribe Bentham volunteers, we are lucky to have a sizeable collection of transcripts that can be used as training data for automated text recognition.

We last summarised our experiments with HTR in a blog post from June 2017.   At that point, we had used technology from the Computational Intelligence Technology Lab (CITlab) at the University of Rostock to produce a model capable of processing the easier papers from the Bentham collection, largely those written by Bentham’s secretaries.  This model can automatically produce transcripts with a Character Error Rate of between 5 and 10%, meaning that 90-95% of characters in the transcript are correct.  The Bentham model is now publicly available in Transkribus under the title ‘English Writing M1’ and has been applied to other collections of eighteenth- and nineteenth-century English handwriting with some success.

Screenshot from Transkribus with automatically generated transcript. Box Add 3350, fo. 158, The British Library (Click to enlarge image)

Although this model copes well with documents where the handwriting and layout are relatively clear, it struggles to recognise the more difficult examples of writing from Bentham’s own hand.  So we decided to take on the challenge of teaching a computer to read some of the very worst examples of Bentham’s handwriting!

We used the Transkribus platform to create training data based on Boxes 30 and 31 of the Bentham Papers held in UCL Special Collections.  These manuscripts were part of Bentham’s lifelong obsession with critiquing the work of William Blackstone, the English jurist who was most famous for his Commentaries on the Laws of England (1765-9).  To create the training data, we uploaded around 200 digital images to Transkribus, segmented each image into lines and then copied over existing transcripts to match each image.

The resulting model generates transcripts with an average Character Error Rate of 26%.  This error rate is unfortunately too high to automatically produce transcripts suitable for scholarly editing.  Nevertheless, it does have the potential to facilitate the full-text search of the Bentham Papers.  Transkribus now includes sophisticated Keyword Spotting technology, which is capable of finding words and phrases in documents, even if they have been mistranscribed by the computer.

Screenshot from Transkribus with automatically generated transcript.  Box 31, fol. 78, UCL Bentham Papers, Special Collections, University College London (Click to enlarge image)

We are working with the Pattern Recognition and Human Language Technology (PRHLT) research centre at the Universitat Politècnica de València and the Digitisation and Digital Preservation group at the University of Innsbruck to present an open-access search functionality for the Bentham Papers.  We are also hoping that volunteers could get involved in this endeavour by checking and correcting the results of significant search queries to ensure their accuracy.

Improving the recognition of Bentham’s handwriting is our other aim and to this end, we will be producing more pages of training data in Transkribus.  The technology moves so fast that the efficiency of this process has already been streamlined thanks to technology from CITlab.  Transkribus can now easily find lines in images (even in documents with complex layouts) and it is also possible to use existing transcripts to automatically train a model, rather than copying them into Transkribus line by line.

If you are interested in following in our footsteps, you are welcome to give Transkribus a try!  You can find more information on the READ website and in the Transkribus How to Guides.

I would like to thank Chris Riley, PhD student and transcription assistant at the Bentham Project, for helping to produce the training data for the latest Bentham model.

Transcription Update – 6 January to 2 February 2018

By Louise Seaward, on 12 February 2018

Exciting is building at Transcribe Bentham HQ as we’re coming very close to the huge milestone of 20,000 pages transcribed by volunteers!  As ever, we extend a heartfelt thank you to our volunteers – your hard work makes all this possible.

Here are the full statistics for the initiative – as of 2 February 2018.

19,802 manuscript pages have now been transcribed or partially-transcribed. Of these transcripts, 18,976 (95%) have been checked and approved by TB staff.

Over the past four weeks, volunteers have worked on a total of 177 manuscript pages. This means that an average of 44 pages have been transcribed each week during the past month.

Check out the Benthamometer for more information on how much has been transcribed from each box of Bentham’s papers!

Project Update – Quality and Cost-Efficiency of Transcription

By Louise Seaward, on 1 February 2018

Our January project update is running a little late, but I hope I’ll be forgiven!

Amongst other things, I’ve been busy preparing the first edition of the Transcribe Bentham newsletter, a new monthly email update on the project.  You can expect to find out about the latest volunteer discoveries, our progress towards our transcription goals and other Bentham-related news and events.  Sign up here.

Coming back to the subject of today’s blog post, I wanted to celebrate the publication of a new article on Transcribe Bentham. ”Making such bargain: Transcribe Bentham and the quality and cost-effectiveness of crowdsourced transcription’ has been published in the journal, Digital Scholarship in the Humanities (open-access version is coming soon).  It was written by former Transcribe Bentham colleagues Dr Tim Causer, Dr Kris GrintDr Anna-Maria Sichani and Professor Melissa Terras.  The article is a detailed statistical evaluation of the quality and efficiency of Transcribe Bentham as a transcription project.  It reveals that we have a lot to be proud of!

Dr Tim Causer (Bentham Project, UCL), one of the authors of the new Transcribe Bentham article

We have long known that Transcribe Bentham is ground-breaking in terms of public engagement and access to historical material.  We are working with a fantastic community of volunteers and making thousands of pages of Bentham manuscripts and transcripts freely available online.  This article goes into more depth about other benefits of Transcribe Bentham, in terms of the quality and cost-efficiency of the work done by our volunteers.

The article is a pain-staking study of over 4000 transcripts submitted by Transcribe Bentham volunteers and then checked by Transcribe Bentham staff over a 20 month period.  The team analysed the quality of each submitted transcript and the time taken to review and edit it.  This analysis indicated that the work produced by our volunteers is of a very high quality (as we have always suspected!).  Most of the transcripts submitted on the site required only a handful of editorial changes, and nearly half required no changes at all.  It takes Transcribe Bentham staff an average of only 3.5 minutes to check and approve a page submitted by a volunteer.  So we have empirical evidence that the work of our volunteers is accurate enough to be used in a public database and as basis for further research.

Volunteer transcription is not only accurate, but efficient too.  It is much quicker for Transcribe Bentham staff to check transcripts, than to transcribe them from scratch.  There is also significant potential for cost-avoidance, even if we take into account the fact that the Bentham Project has received significant funding to establish and maintain the initiative.  Moreover, Transcribe Bentham contributes hugely to the ongoing work of the Bentham Project in producing the definitive scholarly edition of The Collected Works of Jeremy Bentham.  Bentham Project researchers can make use of transcripts produced by volunteers, so they now have a head-start when they are editing Bentham’s papers.  The article also estimates that if our transcribers continue to work at their current rate, Bentham’s papers could be completely transcribed by 2036.  This would be an astonishing achievement, especially considering that the Bentham Project has been working towards this goal since the late 1950s!

We are immensely grateful to our volunteers and do not wish to reduce them to a set of statistics.  Rather, this article is designed to provide evidence of the tangible benefits of crowdsourcing transcription, pointing out the signficant success of Transcribe Bentham and also offering a model for other projects who might like to follow our lead.  It really shows the huge contribution that volunteers are making to Bentham scholarship.

If you have any questions about the article, please contact Dr Tim Causer (Bentham Project, UCL).

If you are interested in finding out more about the history of Transcribe Bentham, you can read other articles at our Publications page.