June | 2017 | UCL Transcribe Bentham

Archive for June, 2017

Transcription Update – 27 May to 23 June 2017

By uczwlse, on 30 June 2017

Howdy. Another month has rolled on by and we’re here to let you know about the latest activities of our transcibers. We are very grateful to everyone who has been transcribing lately.

These are the latest statistics as of 23 June 2017.

18,459 manuscript pages have now been transcribed or partially-transcribed. Of these transcripts, 17,394 (94%) have been checked and approved by TB staff.

Over the past four weeks, volunteers have worked on a total of 104 manuscript pages. This means that an average of 26 pages have been transcribed each week during the past month.

The more detailed progress chart is as follows:

Box	No. of manuscripts worked on	No. of manuscripts in box	Completion
Box 1	769	794	96%
Box 2	729	753	96%
Box 4	48	694	6%
Box 5	201	290	69%
Box 6	2	246	1%
Box 7	6	165	3%
Box 8	24	284	8%
Box 9	56	265	21%
Box 10	116	456	25%
Box 11	24	480	5%
Box 12	179	615	29%
Box 13	21	359	5%
Box 14	167	510	32%
Box 15	86	814	10%
Box 16	12	254	4%
Box 18	67	192	34%
Box 23	1	256	1%
Box 26	113	374	30%
Box 27	350	350	COMPLETE
Box 29	22	122	18%
Box 30	5	193	2%
Box 31	21	302	6%
Box 32	1	158	1%
Box 34	41	398	10%
Box 35	287	439	65%
Box 36	38	418	9%
Box 37	36	487	7%
Box 38	238	424	56%
Box 39	12	282	4%
Box 41	88	572	15%
Box 42	103	910	11%
Box 44	53	201	26%
Box 47	1	466	1%
Box 50	180	198	90%
Box 51	388	939	41%
Box 52	7	609	1%
Box 54	0	205	0%
Box 57	20	420	4%
Box 60	3	183	1%
Box 62	78	564	13%
Box 63	156	345	45%
Box 70	308	347	88%
Box 71	663	663	COMPLETE
Box 72	614	664	92%
Box 73	151	151	COMPLETE
Box 75	3	77	3%
Box 79	199	199	COMPLETE
Box 81	3	488	1%
Box 87	13	604	2%
Box 95	126	147	85%
Box 96	534	539	99%
Box 97	151	295	51%
Box 98	225	499	45%
Box 100	214	429	49%
Box 104	3	502	1%
Box 106	236	581	40%
Box 107	523	542	96%
Box 110	15	671	2%
Box 115	277	307	90%
Box 116	795	865	91%
Box 117	507	853	59%
Box 118	267	880	30%
Box 119	644	990	65%
Box 120	685	685	COMPLETE
Box 121	150	529	28%
Box 122	309	728	42%
Box 123	45	437	10%
Box 124	18	382	4%
Box 135	83	571	14%
Box 139	40	579	6%
Box 141	95	380	25%
Box 149	88	581	15%
Box 150	972	972	COMPLETE
Box 169	225	728	30%
Add MS 35537	734	744	98%
Add MS 35538	824	858	96%
Add MS 35539	883	947	93%
Add MS 35540	947	1012	93%
Add MS 35541	993	1258	78%
Add MS 35547	34	701	4%
Add MS 35549	24	364	6%
Add MS 35550	90	637	14%
Overall	18,459	41,372	44%

Filed under News, Transcription

No Comments »

Project Update – Report from the British Academy soirée

By uczwlse, on 23 June 2017

A guest post by Dr Tim Causer who represented Transcribe Bentham and the Bentham Project at the latest British Academy soirée

Professor Philip Schofield and Dr Tim Causer represented the Bentham Project at the British Academy soirée on 20 June. Over 500 people attended the event and heard talks from a number of British Academy Fellows, and visited stands featuring the work of British Academy Research Projects, of which the Bentham Project is one.

Professor Schofield and Dr Causer, stationed in the Council Room beside Henry Pickersgill’s 1829 portrait of Bentham, discussed with visitors the work of the Project, the production of The Collected Works of Jeremy Bentham, recent open-access publications from UCL Press, and the ongoing and exciting work of the European Commission-funded READ project. Of particular interest to visitors was the Transkribus platform and its Handwritten Text Recognition tools, and the prototype ‘ScanTent’ which, when used in conjunction with the free and forthcoming DocScan app, allows users to efficiently capture images of archival and printed material.

Professor Philip Schofield and Dr Tim Causer at the British Academy

A good time was had by all, particularly under the beneficent eye of Mr Bentham himself!

Filed under Events, News, READ project

No Comments »

Project Update – teaching a computer to READ Bentham

By uczwlse, on 9 June 2017

The difficulty of Bentham’s handwriting is notorious. At the Bentham Project, we have years of experience of transcribing Bentham but you will still regularly find us hunched over a manuscript with a magnifying glass or blankly staring at a digital image on a computer screen, zooming in and out on a particular word.

One of the Bentham Project’s favourite tools

Across the last few years, we have been working closely with various teams of computer scientists in the hope of making progress on the automated recognition of Bentham’s writing. This collaboration started under the tranScriptorium project in 2013 and now continues in its successor project READ (Recognition and Enrichment of Archival Documents).

READ’s mission is to make archival collections more accessible through the development and dissemination of Handwritten Text Recognition (HTR) technology. This technology is freely available through the Transkribus platform. Using algorithms of machine learning, it is possible to teach a computer to read a particular style of writing. The technology is trained by being shown images of documents and their accurate transcriptions. Anyone can start a test project with around 20,000 words or around 100 pages.

Under the tranScriptorium project, we initially had some success in training a model to process manuscripts from the Bentham collection. Using around 900 pages of Bentham images and transcripts, researchers from the Pattern Recognition and Human Language Technology (PRHLT) research centre at the Universitat Politècnica de València created a HTR model for us using statistical algorithms called Hidden Markov Models. This model was able to produce relatively accurate transcriptions of the Bentham papers, with a Character Error Rate of around 18% (meaning that around 82% of the characters in a transcript would be correct).

In the first stage of the READ project, we have already been able to enhance the accuracy of the HTR technology. The team at the Computational Intelligence Technology Lab (CITlab) at the University of Rostock created a new model using this same dataset. This model was based on Neural Networks, computational models for machine learning which work similarly to the human brain. This model can produce automatic transcripts of the Bentham papers with a Character Error Rate of only 5-10%.

Now it’s time to take things up a notch! In our first experiments with HTR, we put forward ‘easier’ documents for the computer to process. These tended to be pages written by Bentham’s secretaries where the layout is clear and the handwriting relatively neat. Now we want to test how the computer copes with some of the worst examples of Bentham’s writing. We are producing a new set of training data based on a selection of manuscripts which were written by Bentham himself when the philosopher was in his eighties. Box xxx of the Bentham Papers in UCL Special Collections contains the Blackstone Familiarized papers. These were part of Bentham’s lifelong obsession with critiquing the work of William Blackstone, the English jurist who was most famous for his Commentaries on the Laws of England (1765-9). Bentham first turned against Blackstone as a teenage student when he attended his lectures at the University of Oxford. In several published works and unpublished papers, Bentham argued that Blackstone was an apologist for the obvious inadequacies in the English legal system and blind to the necessity of reform.

Screenshot of page from Blackstone Familiarized in Transkribus. UCL Special Collections, Bentham Papers, Box xxx, fo. 156 [Image: UCL Special Collections]

The Blackstone Familiarized papers have been digitised by UCL Creative Media Services and transcribed by Professor Philip Schofield, the Director of the Bentham Project and General Editor of the Collected Works of Jeremy Bentham. The images were uploaded to Transkribus and Chris Riley, a PhD student from the Faculty of Laws, has been marking the lines of text on each image and then copying the transcripts into the platform.

We are aiming to produce 200 pages of ‘difficult’ Bentham training data which can be fed into a new version of our latest HTR model. We are also interested in comparing the accuracy of different models. How far does this new material enhance the accuracy of the models we already have and would it be worthwhile to have separate models for Bentham himself and his secretaries?

The prospect of the automated recognition of Bentham’s handwriting would considerably speed up the full transcription of Bentham’s writings and the publication of his Collected Works. We also want to experiment with HTR technology in a new version of Transcribe Bentham where volunteer transcribers could ask the computer to provide suggested readings of words that they are difficult to decipher. Until then, we have some more transcribing to do!

Filed under News, READ project, Transcription

7 Comments »

Archive for June, 2017

Transcription Update – 27 May to 23 June 2017

Project Update – Report from the British Academy soirée

Project Update – teaching a computer to READ Bentham

Recent Posts