Gap-fill Tests for Language Learners: Corpus-Driven Item Generation

Simon Smith, P.V.S. Avinesh, Adam Kilgarriff

Research output: Contribution to conferencePaper

235 Downloads (Pure)

Abstract

Gap-fill exercises have an important role in language teaching. They allow students to demonstrate that they understand vocabulary in context, discouraging memorisation of translations. It is time consuming and difficult for item writers to create good test items, and even then test items are open to Sinclair’s critique of invented examples. We present a system,TEDDCLOG, which automatically generates draft test items from a corpus. TEDDCLOG takes the key (the word which will form the correct answer to the exercise) as input. It finds distractors (the alternative, wrong answers for the multiplechoice question) from a distributional thesaurus, and identifies a collocate of the key that does not occur with the distractors. Next it finds a simple corpus sentence containing
the key and collocate. The system then presents the sentences and distractors
to the user for approval, modification or rejection. The system is implemented using the API to the Sketch Engine, a leading corpus query system. We compare TEDDCLOG with other gap-fill-generation systems, and offer a partial evaluation of the results.
Key Words: gap-fill, Sketch Engine, corpus linguistics, ELT, GDEX, proficiency
testing
Original languageEnglish
Number of pages7
Publication statusPublished - 2010

Fingerprint

language
Intergenerational relations
thesaurus
vocabulary
writer
linguistics
Teaching
evaluation
student
time

Bibliographical note

The attached paper is also available online on the International Institute of Information Technology, Hyderabad (India) website at: http://web.iiit.ac.in/~avinesh/papers/TeddclogICON2010.pdf. The paper was given at the ICON-2010: 8th International Conference on Natural Language Processing, Kharagpur, India. The full proceedings have been published by Macmillan Publishers, India - http://www.macmillanindia.com. Author's note: This research is linked to work on the construction and use of linguistic corpora being conducted by other members of the department. It describes and evaluates a corpu-based computer system which automatically generates gap-fill test items. The system requires further tuning and testing, but it could be used on a very wide scale for gatekeeping and proficiency testing by universities, language schools and other users, allowing almost limitless numbers of test items to be generated with much reduced human intervention.
The system is based on a careful and intuitive application of collocation facts in a linguistic corpus. Gap-fill (sentence completion) items include so-called distractors (in correct answers) to “distract” the test-taker. These must be sufficiently different from the correct answer that no distractor could be an alternative correct answer, and sufficiently similar that the test item is not too easy. The system goes some way towards resolving that tension.
The evaluation was not exhaustive but still thorough and rigorous.
Other systems make test items automatically, but TEDDCLOG creates authentic items (they consist of language that someone has actually used in real life, and are not created by a test item writer or language teacher)

Keywords

  • gap-fill
  • Sketch Engine
  • corpus
  • linguistics
  • ELT
  • GDEX
  • proficiency
  • testing

Cite this

Gap-fill Tests for Language Learners: Corpus-Driven Item Generation. / Smith, Simon; Avinesh, P.V.S.; Kilgarriff, Adam.

2010.

Research output: Contribution to conferencePaper

@conference{1d1945d13c0848adac57f0820f7261f6,
title = "Gap-fill Tests for Language Learners: Corpus-Driven Item Generation",
abstract = "Gap-fill exercises have an important role in language teaching. They allow students to demonstrate that they understand vocabulary in context, discouraging memorisation of translations. It is time consuming and difficult for item writers to create good test items, and even then test items are open to Sinclair’s critique of invented examples. We present a system,TEDDCLOG, which automatically generates draft test items from a corpus. TEDDCLOG takes the key (the word which will form the correct answer to the exercise) as input. It finds distractors (the alternative, wrong answers for the multiplechoice question) from a distributional thesaurus, and identifies a collocate of the key that does not occur with the distractors. Next it finds a simple corpus sentence containingthe key and collocate. The system then presents the sentences and distractorsto the user for approval, modification or rejection. The system is implemented using the API to the Sketch Engine, a leading corpus query system. We compare TEDDCLOG with other gap-fill-generation systems, and offer a partial evaluation of the results.Key Words: gap-fill, Sketch Engine, corpus linguistics, ELT, GDEX, proficiencytesting",
keywords = "gap-fill, Sketch Engine, corpus, linguistics, ELT, GDEX, proficiency, testing",
author = "Simon Smith and P.V.S. Avinesh and Adam Kilgarriff",
note = "The attached paper is also available online on the International Institute of Information Technology, Hyderabad (India) website at: http://web.iiit.ac.in/~avinesh/papers/TeddclogICON2010.pdf. The paper was given at the ICON-2010: 8th International Conference on Natural Language Processing, Kharagpur, India. The full proceedings have been published by Macmillan Publishers, India - http://www.macmillanindia.com. Author's note: This research is linked to work on the construction and use of linguistic corpora being conducted by other members of the department. It describes and evaluates a corpu-based computer system which automatically generates gap-fill test items. The system requires further tuning and testing, but it could be used on a very wide scale for gatekeeping and proficiency testing by universities, language schools and other users, allowing almost limitless numbers of test items to be generated with much reduced human intervention. The system is based on a careful and intuitive application of collocation facts in a linguistic corpus. Gap-fill (sentence completion) items include so-called distractors (in correct answers) to “distract” the test-taker. These must be sufficiently different from the correct answer that no distractor could be an alternative correct answer, and sufficiently similar that the test item is not too easy. The system goes some way towards resolving that tension. The evaluation was not exhaustive but still thorough and rigorous. Other systems make test items automatically, but TEDDCLOG creates authentic items (they consist of language that someone has actually used in real life, and are not created by a test item writer or language teacher)",
year = "2010",
language = "English",

}

TY - CONF

T1 - Gap-fill Tests for Language Learners: Corpus-Driven Item Generation

AU - Smith, Simon

AU - Avinesh, P.V.S.

AU - Kilgarriff, Adam

N1 - The attached paper is also available online on the International Institute of Information Technology, Hyderabad (India) website at: http://web.iiit.ac.in/~avinesh/papers/TeddclogICON2010.pdf. The paper was given at the ICON-2010: 8th International Conference on Natural Language Processing, Kharagpur, India. The full proceedings have been published by Macmillan Publishers, India - http://www.macmillanindia.com. Author's note: This research is linked to work on the construction and use of linguistic corpora being conducted by other members of the department. It describes and evaluates a corpu-based computer system which automatically generates gap-fill test items. The system requires further tuning and testing, but it could be used on a very wide scale for gatekeeping and proficiency testing by universities, language schools and other users, allowing almost limitless numbers of test items to be generated with much reduced human intervention. The system is based on a careful and intuitive application of collocation facts in a linguistic corpus. Gap-fill (sentence completion) items include so-called distractors (in correct answers) to “distract” the test-taker. These must be sufficiently different from the correct answer that no distractor could be an alternative correct answer, and sufficiently similar that the test item is not too easy. The system goes some way towards resolving that tension. The evaluation was not exhaustive but still thorough and rigorous. Other systems make test items automatically, but TEDDCLOG creates authentic items (they consist of language that someone has actually used in real life, and are not created by a test item writer or language teacher)

PY - 2010

Y1 - 2010

N2 - Gap-fill exercises have an important role in language teaching. They allow students to demonstrate that they understand vocabulary in context, discouraging memorisation of translations. It is time consuming and difficult for item writers to create good test items, and even then test items are open to Sinclair’s critique of invented examples. We present a system,TEDDCLOG, which automatically generates draft test items from a corpus. TEDDCLOG takes the key (the word which will form the correct answer to the exercise) as input. It finds distractors (the alternative, wrong answers for the multiplechoice question) from a distributional thesaurus, and identifies a collocate of the key that does not occur with the distractors. Next it finds a simple corpus sentence containingthe key and collocate. The system then presents the sentences and distractorsto the user for approval, modification or rejection. The system is implemented using the API to the Sketch Engine, a leading corpus query system. We compare TEDDCLOG with other gap-fill-generation systems, and offer a partial evaluation of the results.Key Words: gap-fill, Sketch Engine, corpus linguistics, ELT, GDEX, proficiencytesting

AB - Gap-fill exercises have an important role in language teaching. They allow students to demonstrate that they understand vocabulary in context, discouraging memorisation of translations. It is time consuming and difficult for item writers to create good test items, and even then test items are open to Sinclair’s critique of invented examples. We present a system,TEDDCLOG, which automatically generates draft test items from a corpus. TEDDCLOG takes the key (the word which will form the correct answer to the exercise) as input. It finds distractors (the alternative, wrong answers for the multiplechoice question) from a distributional thesaurus, and identifies a collocate of the key that does not occur with the distractors. Next it finds a simple corpus sentence containingthe key and collocate. The system then presents the sentences and distractorsto the user for approval, modification or rejection. The system is implemented using the API to the Sketch Engine, a leading corpus query system. We compare TEDDCLOG with other gap-fill-generation systems, and offer a partial evaluation of the results.Key Words: gap-fill, Sketch Engine, corpus linguistics, ELT, GDEX, proficiencytesting

KW - gap-fill

KW - Sketch Engine

KW - corpus

KW - linguistics

KW - ELT

KW - GDEX

KW - proficiency

KW - testing

M3 - Paper

ER -