Chapter 6: Text encoding and scholarly digital editions

Julianne Nyhan

Chapter Overview

Despite the fascinating work being carried out in image-based computing at UCL, it is, of course, important to remember that digital texts still offers us many exciting opportunities for new work in digital humanities, as Julianne Nyhan shows in Chapter 6. She reflects on how a digital text, created using digital humanities methodologies and techniques, tends to differ from other kinds of digital texts. She asks what the Text Encoding Initiative (TEI) is and how it can be used, and she gives an overview of the advantages and disadvantages of TEI. Current practice is evidenced by the inclusion of two case studies: the ‘Webbs on the Web’ project and the ‘DALF’ project. The chapter closes by pointing to key resources for the teaching and learning of TEI.

Case Study: The Webbs on the Web Project (link)

Ed Fay, Digital Library Manager London School of Economics

The manuscript and printed works of Sidney and Beatrice Webb are among the founding collections of the library at LSE. To this day, their works are regularly requested by researchers, and Beatrice Webb’s extensive diary is a key resource for research into a wider range of subjects, which includes politics in the late 19th and early 20th century, industrial relations, the role of women in society and family relationships.

The diaries were chosen as the launch collection for the new LSE digital library, with funding from the Webb Memorial Trust. LSE is one of the first academic libraries to provide a digital library – a service which is becoming more and more necessary, due to the requirement to collect, preserve and provide access to digital material. This is compounded by the popularity of social media today and its importance as a historical record, particularly to an institution like LSE. The Webbs on the Web Project provides a single gateway to the works (published and unpublished) of Beatrice and Sidney Webb. It provides:

• online access to a bibliography of published works, with links to digital versions, where they are available
• digitization, transcription and online access to the typescript and manuscript versions of Beatrice Webb’s diary, with search browsing capability
• an online gallery of images of Beatrice and Sidney Webb.

All 9000 pages of the actual manuscript, as well as 8000 pages of a transcribed version that is cross-referenced with the date fields indexed from the manuscript version, were made available. Both versions can now be viewed side-by-side for comparison. The incorporation of TEI into the LSE digital library technology stack allowed it to be augmented with specific functionality for textual comparison. This has allowed the LSE digital library to not only make digital content available, but to make that content richer and capable of meeting research needs. The general approach was to use an instance of the institutional repository (E-Prints) for the online bibliography, holding bibliographic metadata (DC) and full-text (PDF) publications, where available in digital form already. The digital library (Fedora) holds the scanned diaries (TIFF) and transcribed and marked-up text (TEI/XML), supporting their indexing and availability for online presentation, via a custom website/application. External suppliers were procured for digitization and transcription of the manuscript and typescript diaries.

Case Study: DALF: Digital Archive of Letters in Flanders

Ron Van den Branden, Centre for Scholarly Editing and Document Studies, Royal Academy of Dutch Language and Literature, and Bert Van Raemdonck, Centre for Scholarly Editing and Document Studies, Royal Academy of Dutch Language and Literature

Back in 2003, the Centre for Scholarly Editing and Document Studies (of the Royal Academy of Dutch Language and Literature) launched the DALF project. DALF (Digital Archive of Letters in Flanders) was envisioned as a growing textbase of letters, generating different products for both academia and a wider audience, thus providing a tool for diverse research disciplines. The input of the textbase was provided through several separate edition projects.

In order to ensure maximum flexibility and (re)usability of each of the digital DALF editions, a formal framework was required. Aiming at adherence to international standards for digital text encoding, DALF used XML to define the structure of a letter as a ‘Document Type Definition’ (DTD). Moreover, to align with international efforts to define markup schemes, DALF took into consideration the insights that had been presented in projects like TEI (Text Encoding Initiative), MASTER (Manuscript Access through Standards for Electronic Records) and MEP (Model Editions Partnership).

At the time, the Text Encoding Initiative had just published the fourth proposal (P4) of its Guidelines for Electronic Text Encoding and Interchange. These guidelines provided an excellent starting point for many of the features we wanted to encode in letters. Yet, some letter-specific features still required some additional means to be properly encoded. Since the TEI scheme could (and can) be extended, this was the approach we took for the development of the DALF DTD.

A selection of several subsets of standard TEI elements was made, to which some new elements and attributes have been added. Most of the new features, e.g. an identifying description of the letter (<letIdentifier>), a description of the most salient communicative parameters (<letHeading>), the description of physical characteristics (<physDesc>), and the presence of an envelope (<envOcc/>) found their place in <letDesc> – a letter-specific section in the DALF header. Some letter-specific structural text elements were added for postscripts (<ps>), envelopes <envelope> and calculations (<calc>). The modifications of the TEI P4 guidelines included the redefinition of <add>, <note> and <seg> as global elements.

The most important digital edition of letters that has been published through the DALF framework so far is the edition of 1500 letters dealing with the Flemish literary journal Van Nu en Straks (1893–1901). This edition has a fully eXist-driven web interface and allows users to browse, search, view and export the encoded letters or custom selections of letters in XHTML, XML or PDF. The letters can be visualized as reading text, diplomatic transcription or XML source view, and facsimiles are offered where available. Most of the 1500 letters are in Dutch; 180 are in French.

As DALF has been defined as an extension to the TEI P4 guidelines, it does not comply to the most recent version of those guidelines. When the TEI published the fifth proposal of its guidelines (P5) in November 2007, five years of ongoing support for projects using TEI P4 had been guaranteed. Therefore, the Centre for Scholarly Editing and Document Studies is planning a DALF update before November 2012. Since some of the sources of inspiration for the original DALF DTD have been incorporated in the TEI P5 encoding scheme (see, for example, the section), a mere redefinition of DALF, in terms of TEI P5, could result in semantic and structural overlap. Moreover, updating to TEI P5 would be a good occasion for evaluating how DALF might be optimized to better meet the needs of scholarly editors, who want to encode correspondence sources. Therefore, a thorough analysis of how DALF 2.0 should be designed is necessary. This analysis is ongoing. For the actual design of DALF 2.0, the Centre for Scholarly Editing and Document Studies will also take the remarks and suggestions made by the TEI’s Special Interest Group for correspondence into consideration (see the DALF Guidelines: www.kantl.be/ctb/project/dalf/dalfdoc/; Van Nu en Straks, The Letters: www.vnsbrieven.org/VNS/).

Further Reading

TEI resources

• The best introduction to XML is the second chapter of the TEI Guidelines – A Gentle Introduction to XML, www.tei-c.org/P4X/SG.html.
• All of the tutorials and exercises in TEI by Example, http://tbe.kantl.be/TBE/.
• Getting started with P5 ODD tutorial, www.tei-c.org/Guidelines/Customization/odds.xml.
• Materials of the Digital.Humanities@Oxford Summer School, http://digital.humanities.ox.ac.uk/DHSS2011/.
• Brown Women Writers Project seminar materials on scholarly text encoding, www.wwp.brown.edu/outreach/seminars/seminar_list.html.

Web Resources

• The MLA Guidelines for Editors of Scholarly Editions, www.mla.org/cse_guidelines
• Journal of the Text Encoding Initiative, http://journal.tei-c.org/journal/index
• Many excellent tutorials on HTML, XHTML and XML, etc., are available via the W3C Schools website, www.w3schools.com/