Activities per year
Abstract
This paper provides an overview of the British Academic Written English (BAWE) corpus, and reports on two multidimensional analyses of the corpus, focussing on the findings relating to informational production and density. The corpus contains about 6.5 million words of proficient university student writing collected from British universities in the first decade of the 21st century, categorised in terms of ‘genre families’ and distributed fairly equally across levels of study and disciplinary groupings (see www.coventry.ac.uk/bawe). BAWE has been named as a major data source in more than 80 publications, and has been examined from many linguistic perspectives, for example in terms of tense, modality and lexical cohesion.
The techniques of multidimensional analysis (MDA) complement studies of individual corpus aspects, because they permit multiple aspects to be examined simultaneously and enable mapping of their distribution across different datasets. In the case of BAWE, we can use MDA to compare linguistic features across levels of study, disciplines and genre families. The first BAWE MDA study was made with reference to the dimensions identified by Biber (1988) when comparing spoken and written registers. The second, BAWE2016, identified new BAWE-specific dimensions and is a more delicate characterisation of the writing produced by British university students.
Biber’s Dimension 1 (1988), ‘Involved versus Informational Production’, contrasts verbal and nominal styles: more ‘informational’ texts have negative scores on this dimension, with lower frequencies of present tense verbs, private verbs, the pro-verb DO, contractions, and 1st and 2nd person pronouns, and greater frequencies of longer words, nouns, attributive adjectives and prepositions. Scores on this dimension were entirely negative across all subsets of BAWE, and became increasingly so in the more advanced levels of study, reaching equivalence with published academic prose. This might indicate progression towards a more ‘academic’ writing style. The BAWE2016 Dimension 4, ‘Informational Density’, gave high scores to texts containing more noun groups and fewer verb groups, more nominalisations of verbs and adjectives, and a greater number of abstract nouns and long words. Again, texts at higher levels of study tended to cluster at the informational end of this dimension.
Informationally dense texts are associated with writing rather than spontaneous speech, because they require more pre-planning and more attentive decoding on the part of the reader. The information load tends to be heavier in written texts because they are permanent and relatively context free; they can be edited and revised, and they do not have to be immediately understood at the time of production. However, although informational density may be equated with academic maturity, the register is not suitable for all the purposes of university student writing. Texts ostensibly written for a non-expert readership, and/or describing human actions rather than abstractions, are likely to be more successful if their information load is lighter. This helps to explain why, in both analyses, informational density was most strongly associated with the Social Sciences in contrast to Arts and Humanities disciplines, and with research-oriented genre families rather than reflective writing and writing oriented towards non-expert readers.
The techniques of multidimensional analysis (MDA) complement studies of individual corpus aspects, because they permit multiple aspects to be examined simultaneously and enable mapping of their distribution across different datasets. In the case of BAWE, we can use MDA to compare linguistic features across levels of study, disciplines and genre families. The first BAWE MDA study was made with reference to the dimensions identified by Biber (1988) when comparing spoken and written registers. The second, BAWE2016, identified new BAWE-specific dimensions and is a more delicate characterisation of the writing produced by British university students.
Biber’s Dimension 1 (1988), ‘Involved versus Informational Production’, contrasts verbal and nominal styles: more ‘informational’ texts have negative scores on this dimension, with lower frequencies of present tense verbs, private verbs, the pro-verb DO, contractions, and 1st and 2nd person pronouns, and greater frequencies of longer words, nouns, attributive adjectives and prepositions. Scores on this dimension were entirely negative across all subsets of BAWE, and became increasingly so in the more advanced levels of study, reaching equivalence with published academic prose. This might indicate progression towards a more ‘academic’ writing style. The BAWE2016 Dimension 4, ‘Informational Density’, gave high scores to texts containing more noun groups and fewer verb groups, more nominalisations of verbs and adjectives, and a greater number of abstract nouns and long words. Again, texts at higher levels of study tended to cluster at the informational end of this dimension.
Informationally dense texts are associated with writing rather than spontaneous speech, because they require more pre-planning and more attentive decoding on the part of the reader. The information load tends to be heavier in written texts because they are permanent and relatively context free; they can be edited and revised, and they do not have to be immediately understood at the time of production. However, although informational density may be equated with academic maturity, the register is not suitable for all the purposes of university student writing. Texts ostensibly written for a non-expert readership, and/or describing human actions rather than abstractions, are likely to be more successful if their information load is lighter. This helps to explain why, in both analyses, informational density was most strongly associated with the Social Sciences in contrast to Arts and Humanities disciplines, and with research-oriented genre families rather than reflective writing and writing oriented towards non-expert readers.
Original language | English |
---|---|
Title of host publication | Proceedings of the International Conference CORPUS LINGUISTICS 2017 |
Place of Publication | St Petersburg |
Publisher | St. Petersburg State University |
Pages | 66-71 |
Number of pages | 6 |
Publication status | Published - 27 Jun 2017 |
Event | International Scientific Conference Corpus Linguistics 2017 - St Petersburg, Russian Federation Duration: 27 Jul 2017 → 30 Jul 2017 https://events.spbu.ru/events/anons/corpora-2017/?lang=Eng |
Conference
Conference | International Scientific Conference Corpus Linguistics 2017 |
---|---|
Country/Territory | Russian Federation |
City | St Petersburg |
Period | 27/07/17 → 30/07/17 |
Internet address |
Fingerprint
Dive into the research topics of 'Information density in a corpus of university student writing'. Together they form a unique fingerprint.-
Why academic writing isn't all the same
Hilary Nesi (Invited speaker)
22 Oct 2019Activity: Talk or presentation › Invited talk
-
Metadiscourse across Languages and Contexts
Hilary Nesi (Keynote Speaker)
18 Oct 2019 → 20 Oct 2019Activity: Participating in or organising an event › Participation in conference