Corpora, Catalogues and Correspondence: The Item-Level Identification and Digitisation of Business Letters for the British Telecom Correspondence Corpus

Ralph Morton, Hilary Nesi

Research output: Chapter in Book/Report/Conference proceedingChapter

19 Downloads (Pure)

Abstract

This paper explores some of the challenges in working with archive material to produce language corpora. It takes as a case study the British Telecom Correspondence Corpus (BTCC) which contains a selection of the letters held in the BT Archives, housed in Holborn Telephone Exchange. One of the essential differences between a corpus and an archive is that a corpus is intended to be representative of a language variety. Material makes its way into historical archives in a variety of ways, and whilst they may preserve a breadth of material; archives are not generally collected to be representative, nor are they primarily designed to facilitate linguistic investigation. Work on the BTCC began as part of a Jisc-funded project to digitise the BT Archives and create a ‘research resource for the higher education sector’ (Hay, 2014:12). The BT Digital Archives became available to the public in July 2013. Our experiences using this resource inform the second half of the paper, in particular regarding the identification of corpus material and the difficulty in identifying letters at an item level. This leads to a wider discussion of how best to digitise physical archives.
Original languageEnglish
Title of host publicationProceedings of the Digital Humanities Congress 2014
EditorsClare Mills, Michael Pidd, Jessica Williams
Place of PublicationSheffield
PublisherHRI Online Publications
Publication statusPublished - 2016
EventDigital Humanities Congress 2014 - Sheffield University, Sheffield, United Kingdom
Duration: 4 Sep 20146 Sep 2014
https://www.digitalpanopticon.org/?p=618

Conference

ConferenceDigital Humanities Congress 2014
CountryUnited Kingdom
CitySheffield
Period4/09/146/09/14
Internet address

Fingerprint

Business Letter
Digitization
Resources
Letters
Language Varieties
Language Corpora
Physical
Digital Archive
Telephone

Bibliographical note

The full text is also available from http://www.hrionline.ac.uk/openbook/chapter/dhc2014-morton
This is an open access publication with a Creative Commons Attribution-NoDerivatives 4.0 International License.

Cite this

Morton, R., & Nesi, H. (2016). Corpora, Catalogues and Correspondence: The Item-Level Identification and Digitisation of Business Letters for the British Telecom Correspondence Corpus. In C. Mills, M. Pidd, & J. Williams (Eds.), Proceedings of the Digital Humanities Congress 2014 Sheffield: HRI Online Publications.

Corpora, Catalogues and Correspondence : The Item-Level Identification and Digitisation of Business Letters for the British Telecom Correspondence Corpus. / Morton, Ralph; Nesi, Hilary.

Proceedings of the Digital Humanities Congress 2014. ed. / Clare Mills; Michael Pidd; Jessica Williams. Sheffield : HRI Online Publications, 2016.

Research output: Chapter in Book/Report/Conference proceedingChapter

Morton, R & Nesi, H 2016, Corpora, Catalogues and Correspondence: The Item-Level Identification and Digitisation of Business Letters for the British Telecom Correspondence Corpus. in C Mills, M Pidd & J Williams (eds), Proceedings of the Digital Humanities Congress 2014. HRI Online Publications, Sheffield, Digital Humanities Congress 2014, Sheffield, United Kingdom, 4/09/14.
Morton R, Nesi H. Corpora, Catalogues and Correspondence: The Item-Level Identification and Digitisation of Business Letters for the British Telecom Correspondence Corpus. In Mills C, Pidd M, Williams J, editors, Proceedings of the Digital Humanities Congress 2014. Sheffield: HRI Online Publications. 2016
Morton, Ralph ; Nesi, Hilary. / Corpora, Catalogues and Correspondence : The Item-Level Identification and Digitisation of Business Letters for the British Telecom Correspondence Corpus. Proceedings of the Digital Humanities Congress 2014. editor / Clare Mills ; Michael Pidd ; Jessica Williams. Sheffield : HRI Online Publications, 2016.
@inbook{d3dd4e000cd0483ca80eac15db75ddcb,
title = "Corpora, Catalogues and Correspondence: The Item-Level Identification and Digitisation of Business Letters for the British Telecom Correspondence Corpus",
abstract = "This paper explores some of the challenges in working with archive material to produce language corpora. It takes as a case study the British Telecom Correspondence Corpus (BTCC) which contains a selection of the letters held in the BT Archives, housed in Holborn Telephone Exchange. One of the essential differences between a corpus and an archive is that a corpus is intended to be representative of a language variety. Material makes its way into historical archives in a variety of ways, and whilst they may preserve a breadth of material; archives are not generally collected to be representative, nor are they primarily designed to facilitate linguistic investigation. Work on the BTCC began as part of a Jisc-funded project to digitise the BT Archives and create a ‘research resource for the higher education sector’ (Hay, 2014:12). The BT Digital Archives became available to the public in July 2013. Our experiences using this resource inform the second half of the paper, in particular regarding the identification of corpus material and the difficulty in identifying letters at an item level. This leads to a wider discussion of how best to digitise physical archives.",
author = "Ralph Morton and Hilary Nesi",
note = "The full text is also available from http://www.hrionline.ac.uk/openbook/chapter/dhc2014-morton This is an open access publication with a Creative Commons Attribution-NoDerivatives 4.0 International License.",
year = "2016",
language = "English",
editor = "Clare Mills and Michael Pidd and Jessica Williams",
booktitle = "Proceedings of the Digital Humanities Congress 2014",
publisher = "HRI Online Publications",

}

TY - CHAP

T1 - Corpora, Catalogues and Correspondence

T2 - The Item-Level Identification and Digitisation of Business Letters for the British Telecom Correspondence Corpus

AU - Morton, Ralph

AU - Nesi, Hilary

N1 - The full text is also available from http://www.hrionline.ac.uk/openbook/chapter/dhc2014-morton This is an open access publication with a Creative Commons Attribution-NoDerivatives 4.0 International License.

PY - 2016

Y1 - 2016

N2 - This paper explores some of the challenges in working with archive material to produce language corpora. It takes as a case study the British Telecom Correspondence Corpus (BTCC) which contains a selection of the letters held in the BT Archives, housed in Holborn Telephone Exchange. One of the essential differences between a corpus and an archive is that a corpus is intended to be representative of a language variety. Material makes its way into historical archives in a variety of ways, and whilst they may preserve a breadth of material; archives are not generally collected to be representative, nor are they primarily designed to facilitate linguistic investigation. Work on the BTCC began as part of a Jisc-funded project to digitise the BT Archives and create a ‘research resource for the higher education sector’ (Hay, 2014:12). The BT Digital Archives became available to the public in July 2013. Our experiences using this resource inform the second half of the paper, in particular regarding the identification of corpus material and the difficulty in identifying letters at an item level. This leads to a wider discussion of how best to digitise physical archives.

AB - This paper explores some of the challenges in working with archive material to produce language corpora. It takes as a case study the British Telecom Correspondence Corpus (BTCC) which contains a selection of the letters held in the BT Archives, housed in Holborn Telephone Exchange. One of the essential differences between a corpus and an archive is that a corpus is intended to be representative of a language variety. Material makes its way into historical archives in a variety of ways, and whilst they may preserve a breadth of material; archives are not generally collected to be representative, nor are they primarily designed to facilitate linguistic investigation. Work on the BTCC began as part of a Jisc-funded project to digitise the BT Archives and create a ‘research resource for the higher education sector’ (Hay, 2014:12). The BT Digital Archives became available to the public in July 2013. Our experiences using this resource inform the second half of the paper, in particular regarding the identification of corpus material and the difficulty in identifying letters at an item level. This leads to a wider discussion of how best to digitise physical archives.

M3 - Chapter

BT - Proceedings of the Digital Humanities Congress 2014

A2 - Mills, Clare

A2 - Pidd, Michael

A2 - Williams, Jessica

PB - HRI Online Publications

CY - Sheffield

ER -