Corpora, Catalogues and Correspondence: The Item-Level Identification and Digitisation of Business Letters for the British Telecom Correspondence Corpus

Ralph Morton, Hilary Nesi

Research output: Chapter in Book/Report/Conference proceedingChapter

24 Downloads (Pure)

Abstract

This paper explores some of the challenges in working with archive material to produce language corpora. It takes as a case study the British Telecom Correspondence Corpus (BTCC) which contains a selection of the letters held in the BT Archives, housed in Holborn Telephone Exchange. One of the essential differences between a corpus and an archive is that a corpus is intended to be representative of a language variety. Material makes its way into historical archives in a variety of ways, and whilst they may preserve a breadth of material; archives are not generally collected to be representative, nor are they primarily designed to facilitate linguistic investigation. Work on the BTCC began as part of a Jisc-funded project to digitise the BT Archives and create a ‘research resource for the higher education sector’ (Hay, 2014:12). The BT Digital Archives became available to the public in July 2013. Our experiences using this resource inform the second half of the paper, in particular regarding the identification of corpus material and the difficulty in identifying letters at an item level. This leads to a wider discussion of how best to digitise physical archives.
Original languageEnglish
Title of host publicationProceedings of the Digital Humanities Congress 2014
EditorsClare Mills, Michael Pidd, Jessica Williams
Place of PublicationSheffield
PublisherHRI Online Publications
Publication statusPublished - 2016
EventDigital Humanities Congress 2014 - Sheffield University, Sheffield, United Kingdom
Duration: 4 Sep 20146 Sep 2014
https://www.digitalpanopticon.org/?p=618

Conference

ConferenceDigital Humanities Congress 2014
CountryUnited Kingdom
CitySheffield
Period4/09/146/09/14
Internet address

    Fingerprint

Bibliographical note

The full text is also available from http://www.hrionline.ac.uk/openbook/chapter/dhc2014-morton
This is an open access publication with a Creative Commons Attribution-NoDerivatives 4.0 International License.

Cite this

Morton, R., & Nesi, H. (2016). Corpora, Catalogues and Correspondence: The Item-Level Identification and Digitisation of Business Letters for the British Telecom Correspondence Corpus. In C. Mills, M. Pidd, & J. Williams (Eds.), Proceedings of the Digital Humanities Congress 2014 Sheffield: HRI Online Publications.