Archive for the 'Transcription' Category

Project update – join us at the Bentham Hackathon with IBM

By Louise Seaward, on 23 August 2017

We’re here with news of an exciting event which will take place in October 2017.  UCL have teamed up with the technology company IBM to organise a ‘Bentham Hackathon‘, where participants can work together to explore how digital tools can help us to research Bentham’s philosophy.

For anyone unfamilar with the term, a hackathon is portmanteau of the words ‘hack’ and ‘marathon’.  It originally referred to an intensive meeting where groups of computer developers collaborated on software projects.  The meaning of a hackathon has now expanded and is often applied to cultural or educational events with a technical element, which are designed to generate new ideas and collaborations.  For more on hackathons, have a look at Wikipedia or the useful ‘How to Guide for hackathons in the cultural sector’ produced by the Europeana Space project.

The Bentham Hackathon will take place over the weekend of 20-22 October 2017 at UCL BaseKX.  The Bentham Project, in association with UCL Centre for Digital Humanities and UCL Innovation and Enterprise, will be working with IBM to explore the following question:

How can digital technologies help us to research Bentham’s philosophy?

Bentham Hackathon

The Bentham Hackathon is an intriguing opportunity for participants to play around with thousands upon thousands of images, transcripts and texts of Bentham’s writings, many of which have been produced in the course of the Transcribe Bentham crowdsourcing initiative.  Let’s see how these amazing resources can be explored and analysed with IBM’s cutting-edge technologies!

We have set four suggested challenges for participants in the Hackathon to work on – although other ideas may emerge in the course of the event.

  1. How can we use keyword searching to explore Bentham’s writings?
  2. Can we use technology to decipher Bentham’s difficult handwriting?
  3. Can we build a user-friendly interface for navigating and transcribing documents?
  4. Can we build a more user-friendly version of the Transcribe Bentham crowdsourcing platform?

Anyone interested in these questions is very welcome to join us at the Bentham Hackathon.  The Hackathon is a free event and there are no pre-requisites for participation.

For technical types, this is a great chance to work with IBM and learn new skills.  Those interested in history, philosophy and Bentham can also give their input to help ensure that digital tools work to enhance learning and research in the humanities. Any Transcribe Bentham volunteers who are close to London might also find the event interesting – your knowledge of Bentham and the process of transcription would be invaluable!

The Hackathon will last for the weekend, starting with an evening presentation on Friday 20 October.  Catering will be provided and participants can get involved in the whole weekend, or just pop in for a while.

The Bentham Hackathon will help us to showcase Bentham’s enormous contribution to philosophical thought, including the way in which his ideas on education inspired the founders of UCL.  And we are hopeful that the innovations developed over the course of this weekend will suggest some new ways to use digital technologies in humanities research.

For more information, check out the Bentham Hackathon webpage or contact us.

Transcription Update – 22 July to 18 August 2017

By Louise Seaward, on 21 August 2017

Hello one and all!  Welcome to our update on the latest transcription figures.  The volunteers have been racing ahead, transcribing close to 50 pages over the past month – which is very impressive!  We need to take this opportunity to thank all of our dedicated volunteers for their hard work.

These are the latest statistics as of 18 August 2017.

18,775 manuscript pages have now been transcribed or partially-transcribed.  Of these transcripts, 17,774 (94%) have been checked and approved by TB staff.

Over the past four weeks, volunteers have worked on a total of 189 manuscript pages.  This means that an average of 47 pages have been transcribed each week during the past month.

The more detailed progress chart is as follows:

Box No. of manuscripts worked on No. of manuscripts in box Completion
Box 1 769 794 96%
Box 2 729 753 96%
Box 3 0 724 0%
Box 4 50 694 7%
Box 5 201 290 69%
Box 6 2 246 1%
Box 7 6 165 3%
Box 8 24 284 8%
Box 9 56 265 21%
Box 10 116 456 25%
Box 11 24 480 5%
Box 12 179 615 29%
Box 13 23 359 6%
Box 14 275 510 53%
Box 15 86 814 10%
Box 16 12 254 4%
Box 18 67 192 34%
Box 23 1 256 1%
Box 26 193 374 51%
Box 27 350 350 COMPLETE
Box 29 22 122 18%
Box 30 5 193 2%
Box 31 21 302 6%
Box 32 10 158 6%
Box 34 41 398 10%
Box 35 287 439 65%
Box 36 38 418 9%
Box 37 37 487 7%
Box 38 238 424 56%
Box 39 12 282 4%
Box 41 88 572 15%
Box 42 118 910 12%
Box 44 53 201 26%
Box 47 1 466 1%
Box 50 180 198 90%
Box 51 388 939 41%
Box 52 7 609 1%
Box 54 0 205 0%
Box 57 20 420 4%
Box 60 3 183 1%
Box 62 78 564 13%
Box 63 157 345 45%
Box 67 0 407 0%
Box 68 0 414 0%
Box 70 308 347 88%
Box 71 663 663 COMPLETE
Box 72 614 664 92%
Box 73 151 151 COMPLETE
Box 75 4 77 5%
Box 79 199 199 COMPLETE
Box 81 4  488 1%
Box 87 13 604 2%
Box 95 126 147 85%
Box 96 534 539 99%
Box 97 151 295 51%
Box 98 225 499 45%
Box 100 214 429 49%
Box 104 3 502 1%
Box 106 236 581 40%
Box 107 523 542 96%
Box 110 15 671 2%
Box 115 277 307 90%
Box 116 795 865 91%
Box 117 511 853 59%
Box 118 278 880 31%
Box 119 645 990 65%
Box 120 685 685 COMPLETE
Box 121 150 529 28%
Box 122 309 728 42%
Box 123 45 437 10%
Box 124 20 382 5%
Box 135 103 571 18%
Box 137 1 499 1%
Box 139 40 579 6%
Box 141 95 380 25%
Box 149 88 581 15%
Box 150 972 972 COMPLETE
Box 169 248 728 34%
Add MS 35537 735 744 98%
Add MS 35538 824 858 96%
Add MS 35539 883 947 93%
Add MS 35540 947 1012 93%
Add MS 35541 999 1258 79%
Add MS 35547 35 701 4%
Add MS 35549 24 364 6%
Add MS 35550 115 637 18%
Overall 18,775 43,416 43%

Transcription Update – 24 June to 21 July 2017

By Louise Seaward, on 21 July 2017

Happy Friday!  It’s time for the latest statistics update. The transcribers have been productive with their typing lately and as always, we are very grateful for their help in advancing knowledge about Bentham’s philosophy.

These are the latest statistics as of 21 July 2017.

18,586 manuscript pages have now been transcribed or partially-transcribed.  Of these transcripts, 17,540 (94%) have been checked and approved by TB staff.

Over the past four weeks, volunteers have worked on a total of 127 manuscript pages.  This means that an average of 32 pages have been transcribed each week during the past month.

The more detailed progress chart is as follows:

Box No. of manuscripts worked on No. of manuscripts in box Completion
Box 1 769 794 96%
Box 2 729 753 96%
Box 4 50 694 7%
Box 5 201 290 69%
Box 6 2 246 1%
Box 7 6 165 3%
Box 8 24 284 8%
Box 9 56 265 21%
Box 10 116 456 25%
Box 11 24 480 5%
Box 12 179 615 29%
Box 13 23 359 6%
Box 14 221 510 43%
Box 15 86 814 10%
Box 16 12 254 4%
Box 18 67 192 34%
Box 23 1 256 1%
Box 26 146 374 39%
Box 27 350 350 COMPLETE
Box 29 22 122 18%
Box 30 5 193 2%
Box 31 21 302 6%
Box 32 9 158 5%
Box 34 41 398 10%
Box 35 287 439 65%
Box 36 38 418 9%
Box 37 37 487 7%
Box 38 238 424 56%
Box 39 12 282 4%
Box 41 88 572 15%
Box 42 110 910 12%
Box 44 53 201 26%
Box 47 1 466 1%
Box 50 180 198 90%
Box 51 388 939 41%
Box 52 7 609 1%
Box 54 0 205 0%
Box 57 20 420 4%
Box 60 3 183 1%
Box 62 78 564 13%
Box 63 156 345 45%
Box 70 308 347 88%
Box 71 663 663 COMPLETE
Box 72 614 664 92%
Box 73 151 151 COMPLETE
Box 75 4 77 5%
Box 79 199 199 COMPLETE
Box 81 4  488 1%
Box 87 13 604 2%
Box 95 126 147 85%
Box 96 534 539 99%
Box 97 151 295 51%
Box 98 225 499 45%
Box 100 214 429 49%
Box 104 3 502 1%
Box 106 236 581 40%
Box 107 523 542 96%
Box 110 15 671 2%
Box 115 277 307 90%
Box 116 795 865 91%
Box 117 507 853 59%
Box 118 267 880 30%
Box 119 645 990 65%
Box 120 685 685 COMPLETE
Box 121 150 529 28%
Box 122 309 728 42%
Box 123 45 437 10%
Box 124 18 382 4%
Box 135 88 571 15%
Box 139 40 579 6%
Box 141 95 380 25%
Box 149 88 581 15%
Box 150 972 972 COMPLETE
Box 169 236 728 32%
Add MS 35537 734 744 98%
Add MS 35538 824 858 96%
Add MS 35539 883 947 93%
Add MS 35540 947 1012 93%
Add MS 35541 994 1258 79%
Add MS 35547 34 701 4%
Add MS 35549 24 364 6%
Add MS 35550 90 637 14%
Overall 18,586 41,372 44%

Transcription Update – 27 May to 23 June 2017

By Louise Seaward, on 30 June 2017

Howdy.  Another month has rolled on by and we’re here to let you know about the latest activities of our transcibers.  We are very grateful to everyone who has been transcribing lately.

These are the latest statistics as of 23 June 2017.

18,459 manuscript pages have now been transcribed or partially-transcribed.  Of these transcripts, 17,394 (94%) have been checked and approved by TB staff.

Over the past four weeks, volunteers have worked on a total of 104 manuscript pages.  This means that an average of 26 pages have been transcribed each week during the past month.

The more detailed progress chart is as follows:

Box No. of manuscripts worked on No. of manuscripts in box Completion
Box 1 769 794 96%
Box 2 729 753 96%
Box 4 48 694 6%
Box 5 201 290 69%
Box 6 2 246 1%
Box 7 6 165 3%
Box 8 24 284 8%
Box 9 56 265 21%
Box 10 116 456 25%
Box 11 24 480 5%
Box 12 179 615 29%
Box 13 21 359 5%
Box 14 167 510 32%
Box 15 86 814 10%
Box 16 12 254 4%
Box 18 67 192 34%
Box 23 1 256 1%
Box 26 113 374 30%
Box 27 350 350 COMPLETE
Box 29 22 122 18%
Box 30 5 193 2%
Box 31 21 302 6%
Box 32 1 158 1%
Box 34 41 398 10%
Box 35 287 439 65%
Box 36 38 418 9%
Box 37 36 487 7%
Box 38 238 424 56%
Box 39 12 282 4%
Box 41 88 572 15%
Box 42 103 910 11%
Box 44 53 201 26%
Box 47 1 466 1%
Box 50 180 198 90%
Box 51 388 939 41%
Box 52 7 609 1%
Box 54 0 205 0%
Box 57 20 420 4%
Box 60 3 183 1%
Box 62 78 564 13%
Box 63 156 345 45%
Box 70 308 347 88%
Box 71 663 663 COMPLETE
Box 72 614 664 92%
Box 73 151 151 COMPLETE
Box 75 3 77 3%
Box 79 199 199 COMPLETE
Box 81 3  488 1%
Box 87 13 604 2%
Box 95 126 147 85%
Box 96 534 539 99%
Box 97 151 295 51%
Box 98 225 499 45%
Box 100 214 429 49%
Box 104 3 502 1%
Box 106 236 581 40%
Box 107 523 542 96%
Box 110 15 671 2%
Box 115 277 307 90%
Box 116 795 865 91%
Box 117 507 853 59%
Box 118 267 880 30%
Box 119 644 990 65%
Box 120 685 685 COMPLETE
Box 121 150 529 28%
Box 122 309 728 42%
Box 123 45 437 10%
Box 124 18 382 4%
Box 135 83 571 14%
Box 139 40 579 6%
Box 141 95 380 25%
Box 149 88 581 15%
Box 150 972 972 COMPLETE
Box 169 225 728 30%
Add MS 35537 734 744 98%
Add MS 35538 824 858 96%
Add MS 35539 883 947 93%
Add MS 35540 947 1012 93%
Add MS 35541 993 1258 78%
Add MS 35547 34 701 4%
Add MS 35549 24 364 6%
Add MS 35550 90 637 14%
Overall 18,459 41,372 44%

Project Update – teaching a computer to READ Bentham

By Louise Seaward, on 9 June 2017

The difficulty of Bentham’s handwriting is notorious.  At the Bentham Project, we have years of experience of transcribing Bentham but you will still regularly find us hunched over a manuscript with a magnifying glass or blankly staring at a digital image on a computer screen, zooming in and out on a particular word.

One of the Bentham Project's favourite tools

One of the Bentham Project’s favourite tools

 

Across the last few years, we have been working closely with various teams of computer scientists in the hope of making progress on the automated recognition of Bentham’s writing.  This collaboration started under the tranScriptorium project in 2013 and now continues in its successor project READ (Recognition and Enrichment of Archival Documents).

READ’s mission is to make archival collections more accessible through the development and dissemination of Handwritten Text Recognition (HTR) technology.  This technology is freely available through the Transkribus platform.  Using algorithms of machine learning, it is possible to teach a computer to read a particular style of writing.  The technology is trained by being shown images of documents and their accurate transcriptions.  Anyone can start a test project with around 20,000 words or around 100 pages.

Under the tranScriptorium project, we initially had some success in training a model to process manuscripts from the Bentham collection.  Using around 900 pages of Bentham images and transcripts, researchers from the Pattern Recognition and Human Language Technology (PRHLT) research centre at the Universitat Politècnica de València created a HTR model for us using statistical algorithms called Hidden Markov Models.  This model was able to produce relatively accurate transcriptions of the Bentham papers, with a Character Error Rate of around 18% (meaning that around 82% of the characters in a transcript would be correct).

In the first stage of the READ project, we have already been able to enhance the accuracy of the HTR technology.  The team at the Computational Intelligence Technology Lab (CITlab) at the University of Rostock created a new model using this same dataset.  This model was based on Neural Networks, computational models for machine learning which work similarly to the human brain.  This model can produce automatic transcripts of the Bentham papers with a Character Error Rate of only 5-10%.

Now it’s time to take things up a notch!  In our first experiments with HTR, we put forward ‘easier’ documents for the computer to process.  These tended to be pages written by Bentham’s secretaries where the layout is clear and the handwriting relatively neat.  Now we want to test how the computer copes with some of the worst examples of Bentham’s writing.  We are producing a new set of training data based on a selection of manuscripts which were written by Bentham himself when the philosopher was in his eighties.  Box xxx of the Bentham Papers in UCL Special Collections contains the Blackstone Familiarized papers.  These were part of Bentham’s lifelong obsession with critiquing the work of William Blackstone, the English jurist who was most famous for his Commentaries on the Laws of England (1765-9).  Bentham first turned against Blackstone as a teenage student when he attended his lectures at the University of Oxford.  In several published works and unpublished papers, Bentham argued that Blackstone was an apologist for the obvious inadequacies in the English legal system and blind to the necessity of reform.

 

Screenshot of page from Blackstone Familiarized in Transkribus. UCL Special Collections, Bentham Papers, Box xxx, fo. 156 [Image: UCL Special Collections]

Screenshot of page from Blackstone Familiarized in Transkribus. UCL Special Collections, Bentham Papers, Box xxx, fo. 156 [Image: UCL Special Collections]

The Blackstone Familiarized papers have been digitised by UCL Creative Media Services and transcribed by Professor Philip Schofield, the Director of the Bentham Project and General Editor of the Collected Works of Jeremy Bentham.  The images were uploaded to Transkribus and Chris Riley, a PhD student from the Faculty of Laws, has been marking the lines of text on each image and then copying the transcripts into the platform.

We are aiming to produce 200 pages of ‘difficult’ Bentham training data which can be fed into a new version of our latest HTR model.  We are also interested in comparing the accuracy of different models.  How far does this new material enhance the accuracy of the models we already have and would it be worthwhile to have separate models for Bentham himself and his secretaries?

The prospect of the automated recognition of Bentham’s handwriting would considerably speed up the full transcription of Bentham’s writings and the publication of his Collected Works.  We also want to experiment with HTR technology in a new version of Transcribe Bentham where volunteer transcribers could ask the computer to provide suggested readings of words that they are difficult to decipher.  Until then, we have some more transcribing to do!