Beals, M. H. and Emily Bell, with contributions by Ryan Cordell, Paul Fyfe, Isabel Galina Russell, Tessa Hauswedell, Clemens Neudecker, Julianne Nyhan, Mila Oiva, Sebastian Padó, Miriam Peña Pimentel, Lara Rose, Hannu Salmi, Melissa Terras, and Lorella Viola. The Atlas of Digitised Newspapers and Metadata: Reports from Oceanic Exchanges. Loughborough: 2020. DOI: 10.6084/m9.figshare.11560059
The Oceanic Exchanges team has just published a substantial open access resource that will advance the state of the art of the cross-collection text analysis of selected North-Atlantic and Anglophone-Pacific retrodigitised nineteenth-century newspapers. We also hope that the approach set out in the report will be taken up by other researchers who wish to engage in foundational research on approaches to cross-collection computational analysis. As the project notes:
the rise of digitisation promises great opportunities for those who wish to engage with newspaper archives, but as with all historical archives, digital collections require researchers to be mindful of their shape, provenance and structure before any conclusion can be drawn. It is the responsibility of both digitiser and researcher to understand both the map and the terrain (see here).
The numerous newspaper digitisation projects that have been undertaken in recent years have resulted in the remediation of many millions of pages of nineteenth-century newspapers. Yet, those researchers who wish to pursue questions about global history, for example, have often found it difficult to carry out data-driven research across those digitised collections. As our report discusses, there are many reasons for this, including how digitisation projects are often undertaken in national settings but newspapers often participate in global conversations; standards that can overarch and integrate numerous, disparate digital newspaper collections have not been implemented; the shape and scope of digitised newspaper collections is informed by a multiplicity of situated contexts which can be difficult for those who are external to digitisation projects to establish; also, though digital newspapers are often encoded in line with METS/ALTO, for example, notable variations exist in how those metadata specifications are applied to digital newspaper collections exist.
To respond to this, and to further research that takes place across digital newspaper collections, this 200 page report brings together qualitative data, metadata and paradata about selected digitised newspaper databases. It provides crucial historical and contextual information about the circumstances under which those collections came into being. It provides a textual ontology that describes the relationships between the informational units of which the respective databases are comprised, between the data and metadata of the different collections and on the interrelationships between analogue newspapers and their retrodigitised representations. Also included are maps which support the visual inspection and comparison of data across disparate newspaper collections alone with JSON or xpath paths to the data.
This report has come about in the context of the Oceanic Exchanges (2017-19) project (of which UCLDHers Julianne Nyhan was UK PI and Tessa Hauswedell was UCL Research Associate). The project was funded through the Transatlantic Partnership for Social Sciences and Humanities 2016 Digging into Data Challenge, and brought together leading efforts in computational periodicals research from six countries—Finland, Germany, Mexico, the Netherlands, the United Kingdom, and the United States—to examine patterns of information flow across national and linguistic boundaries.
The project is also immensely grateful to the many groups and organisations involved in the digitisation of historical newspapers who agreed to be interviewed and consulted during the process of researching the report. You can find the report, metadata maps and other resources here: https://www.digitisednewspapers.net/