Gap-fill Tests for Language Learners: Corpus-Driven Item Generation

Simon Smith, P.V.S. Avinesh, Adam Kilgarriff

Research output: Contribution to conferencePaperpeer-review

513 Downloads (Pure)


Gap-fill exercises have an important role in language teaching. They allow students to demonstrate that they understand vocabulary in context, discouraging memorisation of translations. It is time consuming and difficult for item writers to create good test items, and even then test items are open to Sinclair’s critique of invented examples. We present a system,TEDDCLOG, which automatically generates draft test items from a corpus. TEDDCLOG takes the key (the word which will form the correct answer to the exercise) as input. It finds distractors (the alternative, wrong answers for the multiplechoice question) from a distributional thesaurus, and identifies a collocate of the key that does not occur with the distractors. Next it finds a simple corpus sentence containing
the key and collocate. The system then presents the sentences and distractors
to the user for approval, modification or rejection. The system is implemented using the API to the Sketch Engine, a leading corpus query system. We compare TEDDCLOG with other gap-fill-generation systems, and offer a partial evaluation of the results.
Key Words: gap-fill, Sketch Engine, corpus linguistics, ELT, GDEX, proficiency
Original languageEnglish
Number of pages7
Publication statusPublished - 2010

Bibliographical note

The attached paper is also available online on the International Institute of Information Technology, Hyderabad (India) website at: The paper was given at the ICON-2010: 8th International Conference on Natural Language Processing, Kharagpur, India. The full proceedings have been published by Macmillan Publishers, India - Author's note: This research is linked to work on the construction and use of linguistic corpora being conducted by other members of the department. It describes and evaluates a corpu-based computer system which automatically generates gap-fill test items. The system requires further tuning and testing, but it could be used on a very wide scale for gatekeeping and proficiency testing by universities, language schools and other users, allowing almost limitless numbers of test items to be generated with much reduced human intervention.
The system is based on a careful and intuitive application of collocation facts in a linguistic corpus. Gap-fill (sentence completion) items include so-called distractors (in correct answers) to “distract” the test-taker. These must be sufficiently different from the correct answer that no distractor could be an alternative correct answer, and sufficiently similar that the test item is not too easy. The system goes some way towards resolving that tension.
The evaluation was not exhaustive but still thorough and rigorous.
Other systems make test items automatically, but TEDDCLOG creates authentic items (they consist of language that someone has actually used in real life, and are not created by a test item writer or language teacher)


  • gap-fill
  • Sketch Engine
  • corpus
  • linguistics
  • ELT
  • GDEX
  • proficiency
  • testing


Dive into the research topics of 'Gap-fill Tests for Language Learners: Corpus-Driven Item Generation'. Together they form a unique fingerprint.

Cite this