A A A

Archive for the 'Users' Category

Searching Bentham’s manuscripts with Keyword Spotting!

By Louise Seaward, on 15 October 2018

The Bentham Project has been experimenting with the Handwritten Text Recognition (HTR) of Bentham’s manuscripts for the past five years, first as a partner in the tranScriptorium project and now as part of READ.

Read about our progress with HTR and the Transkribus platform in blog posts from June 2017 and  February 2018.

Keyword Spotting

Our results have thus far been impressive, especially considering the immense difficulty of Bentham’s own handwriting.  But automated transcription is not yet at a point where it is sufficiently accurate to be used by Bentham Project researchers as a basis for scholarly editing.  It would be too time-consuming (and probably too irritating!) for us to correct the errors in the computer-generated transcripts of papers written in Bentham’s hand.

However, the current state of the technology is strong enough for keyword searching!  And thanks to a collaboration with the PRHLT research center at the Universitat Politècnica de València (another partner in the READ project) we have some exciting new results to report.  It is now possible to search over 90,000 digital images of the central collections of Bentham’s manuscripts, which are held at Special Collections University College London and The British Library.

A Keyword Spotting search for the word ‘pleasure’

 

Appeal for volunteers!

I have prepared a Google sheet with some suggested search terms in 5 different spreadsheet tabs (Bentham’s neologisms, concepts, people, places and other).

It would be fantastic if people filled in the spreadsheet to record some of their searches, using my suggested search terms and some of their own.  Transcribers could search for subjects they are interested in and then cross-reference to material on the Transcription Desk that they might like to transcribe.

Who knows what we might find??  I hope to share some of these results in my upcoming presentation at the Transkribus User Conference in November 2018.  Thanks in advance for your participation.

Background

The PRHLT team have processed the Bentham papers with cutting-edge HTR and probabilistic word indexing technologies.  This sophisticated form of searching is often called Keyword Spotting.  It is more powerful than a conventional full-text search because it uses statistical models trained for text recognition to search through probability values assigned to character sequences (words), considering most possible readings of each word on a page.

We delivered thousands of images and transcripts to the team in Valencia and gave them access to the data we had already used to train HTR models in Transkribus.  After cleaning our data and using Transkribus technology to divide the images into lines, the team in Valencia trained neural network algorithims to recognise and index the collection.

The result is that this vast collection of Bentham’s papers can be efficiently searched, including those papers that have not yet been transcribed!  The accuracy rates are impressive.  The spots suggest around 84-94% accuracy (6-16% Character Error Rate) when compared with manual transcriptions of Bentham’s manuscripts.  More precisely speaking, laboratory tests show that the word average search precision ranges from 79% to 94%.  This means that, out of 100 average search results, only as few as 6 may fail to actually be the words searched for. The accuracy of spotted words depends on the difficulty of Bentham’s handwriting – although it is possible to find useful results in Bentham’s scrawl!  There could be as many as 25 million words waiting to be found.

Use cases

This fantastic site will be invaluable to anyone interested in Bentham’s philosophy.  It will help Bentham Project researchers to find previously unknown references in pages that have not yet been transcribed.  It will allow researchers to quickly investigate Bentham’s concepts and correspondents.  I hope that it will also help volunteer transcribers to find interesting material.

This interface is a prototype beta version.  In the future we want to increase the power of this research tool by connecting it to other digital resources, allowing users to quickly search the manuscripts at the UCL library repository, the Bentham papers database and the Transcription Desk and linking these images to our rich existing metadata.

Similar Keyword Spotting technology (based  on research by the CITlab team at the University of Rostock, another one of the READ project partners) is currently available to all users of the Transkribus platform.  Find out more at the READ project website.

I welcome any feedback on our new search functionality at: transcribe.bentham@ucl.ac.uk

My thanks go to the PRHLT research center, the University of Innsbruck and Chris Riley, Transcription Assistant at the Bentham Project for their support and assistance.

The Transcription Desk is open again! Migration completed successfully.

By Louise Seaward, on 10 October 2018

I am pleased to announce that the Transcribe Bentham Transcription Desk is open for business once more.

You can now access the site at a new web address: http://transcribe-bentham.ucl.ac.uk/

Please update your bookmarks with this new link!

The old site will have a redirect in place to direct users to the new site.

Huge thanks go to Tom Couch and his team at UCL Research IT Services, who have successfully migrated the platform onto a new server at UCL and made various updates and fixes to the Mediawiki.  We also thank Co-Sector, University of London for supporting the platform since 2010.

For our volunteers, the site will hopefully not have changed much.  You can log in and continue transcribing as normal.  All previous edits should have been preserved.

Important changes:

One noticeable difference is that the Javascript viewer is now the default mode of viewing images on the platform.  The Flash viewer is due to become obsolete by 2020 and is already blocked by default on most web browsers.  At the moment, it is still possible to view the manuscripts in the Flash viewer by clicking the options at the bottom left of an image.

Known issues:

Please bear in mind there are a few known issues with the new site.  We are working to fix the following bugs as soon as possible:

  • Preview option when transcribing

The option to view a ‘clean’ version of your transcript, without visible TEI tags is currently broken.

  • Emails from TB Editor

Any email notifications from TB_Editor are currently marked as being sent by an account called ‘ccea038’.  This is an admin account which is linked to the server.

I would like to thank our volunteers for their continued patience.  Good luck exploring the new site and please get in touch if you have any feedback, issues or questions (transcribe.bentham@ucl.ac.uk).

Now that the site has been migrated, we can start to daydream about further improvements that we hope to make to the platform in the future too!

Update on migration of Transcription Desk website

By Louise Seaward, on 24 September 2018

Thanks to the hard work of UCL Research IT Services and Co-Sector, University of London, we are almost ready to migrate the Transcribe Bentham Transcription Desk to a new server at UCL.

The planned date of the migration is 9 October 2018. 

I hope that the migration process will not affect volunteers too much.  There will be a day or two of downtime, after which point volunteers will be able to continue transcribing as normal.

UCL Research IT services will update the Mediawiki and eliminate any bugs.  Having the platform hosted at UCL should also put us in a better position to make further technical improvements to it in the future.

Migration Timeline

8 October

  • Volunteers can carry on working up until midnight on 8 October.
  • All edits made by volunteers up until the end of 8 October will be preserved.

9 October

  • The site will go into maintenance mode.  There will be a notice to this effect on the current site.
  • Data will be migrated and tests carried out.
  • Volunteers should not make any edits to the current or new site at this time – any changes made after 8 October will not be saved.

10 October

I am very much looking forward to working with you all on the updated site.  If you have any questions about the migration, please get in touch.

Project update – volunteers credited for first time in Collected Works of Jeremy Bentham!

By Louise Seaward, on 6 September 2018

It’s hard to believe it but September marks the 8th birthday of Transcribe Bentham!  And we have something else to celebrate this month too…

It is with great pleasure that we can announce that a large number of volunteers will be credited for the first time in the latest volumes due to be published by the Bentham Project.

Since the initiative began in September 2010, Transcribe Bentham volunteers have put an enormous amount of effort into transcribing Bentham’s writings and we owe them a huge debt of gratitude.

The thousands upon thousands of volunteer transcripts constitute a unique online resource but they also feed directly into the work of Bentham Project researchers who are editing Bentham’s writings for publication as The Collected Works of Jeremy BenthamTranscripts produced by volunteers mean that researchers do not have to transcribe everything from scratch – they have an accurate first draft for further editing.  The accuracy and efficency of volunteer transcription is discussed further in our latest article published in Digital Scholarship in the Humanities and written by Tim Causer, Kris Grint, Anna-Maria Sichani and Melissa Terras.

Pre-publication volumes of two volumes of Bentham’s Collected Works are now available in open access, in advance of their forthcoming publication by Oxford University Press.

11 volunteers have been credited in Writings on Political Economy, vol. III (see p. viii of the ‘Editorial Introduction’).

34 volunteers have been credited in Writings on Australia (see the ‘Editorial Introductions’ to ‘History of Jeremy Bentham’s dealings with Lord Pelham’, p. v; ‘Letter to Lord Pelham’, p. viii; ‘Second Letter to Lord Pelham’, pp. vi-vii; ‘Third Letter to Lord Pelham’, p. vi; and ‘A Plea for the Constitution’, p. ix).

Both volumes can be consulted freely online and will interest anyone intrigued by Bentham’s ideas on economics, crime and colonies.

Writings on Political Economy, vol. III contains some of Bentham’s drafts from the later 1790s relating to the reform of the policing of the River Thames and the establishment of a Board of Police in London to administer a licensing system for the sale of second hand goods.

Writings on Australia consists of seven texts, four of which are made available for the first time. Six of the texts are intimately connected to Bentham’s attempt to persuade the British government to build his panopticon penitentiary. They include the ‘Letters to Lord Pelham’ and ‘A Plea for the Constitution’, which were highly influential philosophical and legal critiques of convict transportation and the New South Wales penal colony, and were tools by which, in 1802-3, Bentham hoped to force the government to proceed with the establishment of the panopticon.

The seventh text, ‘Colonization Company Proposal’, which was written in 1831, effectively constitutes Bentham’s commentary on the National Colonization Society’s Proposal to His Majesty’s Government for Founding a Colony on the Southern Coast of Australia.

‘A Direct South View of the Town of Sydney taken from the brow of the hill leading to the flag staff’, from ‘Account of the English Colony of New South Wales’ by David Collins, on which Bentham drew extensively when writing on Australia in 1802-3.

 

We hope our volunteers are suitably proud to see their name in lights.  Transcribe Bentham would be nothing without their enthusiasm and diligence and we are truly thankful.  Their contribution to The Collected Works of Jeremy Bentham really shows how crowdsourcing can make a meaningful connection between academia and the general public.

If you would like the chance to be credited in a future volume of Bentham’s writings, sign up for an account at our Transcription Desk and start transcribing!

Project update – the challenge continues!

By Louise Seaward, on 20 August 2018

Last month, I set a new challenge for our volunteer transcribers.  Could we focus attention on certain boxes to try to get them completely transcribed?

Happily, our volunteers rose to the challenge and set about transcribing.  I’m here to give a progress update and to invite everyone to continue transcribing so we can build on our first milestone.

The transcription challenge brought lots of new visitors to the site, tempted several new transcribers to start work and encouraged several old friends to join back in.  This is exactly what we like to see!

As shown in the below table, at last count volunteers have transcribed 63 pages of the targeted material.  Thank you all!

Box Number
No. of pages transcribed
14 3
50 15
70 21
72 4
95 6
537 and 538 14
TOTAL 63

 

Most of the 63 pages are completely transcribed; some are still awaiting review from the Transcribe Bentham administrators.  Having worked with many of our volunteers for years, we are confident that these transcripts are of a high standard.

Let’s keep the transcription challenge going and see how far we can get.  Outstanding material can be found in the below tables – there are still pages from Boxes 50, 70, 72, 95, 537 and 538 to be transcribed. Box 14 is now very nearly complete.  The only remaining pages are written in shorthand, which cannot be transcribed in full in our current system.

I know that many of these pages are rather difficult – but hopefully transcribers will be able to work on a page here and there, alongside their usual material.

I am also hard at work preparing our next transcription challenge, which will be centred on material that is likely to be of more immediate use to researchers at the Bentham Project.

As ever, I am amazed by everyone’s efforts and I look forward to seeing how transcription progresses.  If you have any questions or comments about the challenge, please let me know by email (transcribe.bentham@ucl.ac.uk).

Box 50

Page Number
Content Difficulty of handwriting Foreign language?
 JB/050/174/001  Legal procedure (table form)  Difficult  French

Box 70

Page Number
Content Difficulty of handwriting
Foreign language?
 JB/070/149/001 Laws in general  Moderate  Latin?
JB/070/234/001 Larceny  Difficult
 JB/070/237/001  Larceny  Difficult
 JB/070/238/001  Larceny  Difficult
 JB/070/245/001  Larceny  Difficult
 JB/070/246/001  Larceny  Difficult
 JB/070/247/001  Larceny  Difficult
 JB/070/249/001  Larceny Difficult
 JB/070/252/002  Larceny Difficult
 JB/070/255/001  Larceny Difficult
 JB/070/256/002  Larceny Difficult
 JB/070/261/001  Larceny Difficult

Box 72

Page Number
Content Difficulty of handwriting
Foreign language?
 JB/072/004/001  Offences against person (table form)  Moderate
 JB/072/006/001  Offences against person (table form)  Moderate
 JB/072/006/002  Offences against person (table form)  Moderate
 JB/072/007/001  Offences against property (table form)  Moderate
 JB/072/007/002  Offences against property (table form)  Moderate
 JB/072/008/001  Offences against property (table form)  Moderate
 JB/072/008/002  Offences against property (table form)  Moderate
 JB/072/009/001  Penal code (table form)  Moderate
 JB/072/010/001  Offences against revenue (table form)  Moderate
 JB/072/010/002   Offences against revenue (table form)  Moderate
 JB/072/011/001  Offences against trade (table form)  Moderate
 JB/072/012/001  Offences against morality (table form)   Moderate
 JB/072/014/001  Offences against public property (short table form)   Moderate
 JB/072/016/001  Offences against government (table form)   Moderate
 JB/072/016/002  Offences against government (table form)   Moderate
 JB/072/018/001  Offences against national peace (short table form)   Moderate
 JB/072/019/001  Offences against the coin (table form)   Moderate
 JB/072/090/001  Offences against property (table form)   Moderate
 JB/072/106/002  Penal code   Moderate
 JB/072/183/002  Penal code  Difficult  French
 JB/072/183/003  Penal code  Difficult  French
 JB/072/183/004  Penal code  Difficult  French
 JB/072/184/001  Penal code  Difficult  French
 JB/072/186/001  Penal code  Difficult  French
 JB/072/215/001  Penal code (table form)  Difficult  French
 JB/072/216/001  Penal code  Difficult  French
 JB/072/216/002  Penal code  Difficult  French
 JB/072/216/003  Penal code  Difficult  French
 JB/072/216/004  Penal code  Difficult  French
 JB/072/217/001  Penal code  Difficult  French
 JB/072/219/001  Penal code  Moderate  French
 JB/072/219/002  Penal code  Moderate  French
 JB/072/220/001  Penal code  Moderate  French
 JB/072/220/002  Penal code  Moderate  French
 JB/072/220/003  Penal code   Moderate  French
 JB/072/220/004  Penal code   Moderate  French
 JB/072/221/001  Penal code   Moderate  French
 JB/072/221/002  Penal code   Moderate  French
 JB/072/221/003  Penal code   Moderate  French
 JB/072/221/004  Penal code   Moderate  French
 JB/072/222/001  Penal code   Moderate  French

Box 95

Page Number
Content Difficulty of handwriting
Foreign language?
 JB/095/050/001  Turnpike act (table form)  Difficult
 JB/095/059/001  Turnpike act  Difficult
 JB/095/060/001  Turnpike act  Difficult
 JB/095/071/002  Turnpike act  Difficult
 JB/095/077/002  Turnpike act  Difficult
 JB/095/079/001  Turnpike act  Difficult
 JB/095/079/002  Turnpike act  Difficult
 JB/095/096/001  Turnpike act  Moderate
 JB/095/097/001  Turnpike act  Difficult
 JB/095/105/001  Turnpike act  Difficult
 JB/095/109/001  Turnpike act  Difficult
 JB/095/110/001  Turnpike act  Difficult
 JB/095/112/004  Turnpike act  Difficult

Boxes 537 and 538

Page Number
Content Difficulty of handwriting
Foreign language?
 JB/537/363/002  Jeremy to Samuel Bentham  Difficult  French
 JB/537/364/001  Jeremy to Samuel Bentham  Difficult  French
 JB/537/364/002  Jeremy to Samuel Bentham  Difficult  French
JB/537/365/001  Jeremy to Samuel Bentham  Difficult  French
 JB/538/412/001  Jeremy to Samuel Bentham  Moderate  French
 JB/538/412/002  Jeremy to Samuel Bentham  Moderate  French