Skip to main navigation Skip to search Skip to main content

Efficient Citation Screening by Weak Classifier Ensemble*

  • University of Sheffield

Research output: Chapter in Book/Report/Conference proceedingConference proceedingpeer-review

10 Downloads (Pure)

Abstract

Citation screening in systematic review is timeconsuming. Machine learning can help semi-automate it but faces obstacles. Each systematic review is a new dataset without initial annotations. Extreme class imbalance against irrelevant studies makes it difficult to select a good subset of samples to train a classifier. The rigid requirement of a (near) total recall of relevant studies demands a careful trade-off between accuracy and recall. This paper pilots a weak classifier ensemble approach to tackle both challenges. The idea of ensembling is employed in two ways. First, multiple cost-effective large language models are applied and averaged to score and rank candidate studies to create a balanced pseudo-labelled training set. Second, different sets of pseudo-negative samples are bootstrapped from low-rank documents and multiple classifiers are trained and combined to make screening decisions. Experiments on 28 systematic reviews demonstrate significant performance improvements brought by the weakly supervised classifier ensemble, which also meets the rigid recall requirement for it to be safely used in practice.
Original languageEnglish
Title of host publication2025 ACM/IEEE Joint Conference on Digital Libraries (JCDL)
EditorsHamed Alhoori, J. Stephen Downie, Mat Kelly, Sagnik Ray Choudhury, Ingo Frommholz, Jiangping Chen
PublisherIEEE
Pages265-268
Number of pages4
ISBN (Electronic)979-8-3315-6803-0
ISBN (Print)979-8-3315-6804-7
DOIs
Publication statusPublished - 2 Feb 2026
EventJoint Conference on Digital Libraries -
Duration: 15 Dec 202519 Dec 2025

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Conference

ConferenceJoint Conference on Digital Libraries
Abbreviated titleJCDL
Period15/12/2519/12/25

Bibliographical note

© 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Copyright © and Moral Rights are retained by the author(s) and/ or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This item cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder(s). The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders.

This document is the author’s accepted manuscript version, incorporating any revisions agreed during the peer-review process. Some differences between the published version and this version may remain and you are advised to consult the published version if you wish to cite from it.

Keywords

  • Weakly supervised learning
  • Automated systematic review
  • Citation screening
  • Large language model
  • Ensemble

Fingerprint

Dive into the research topics of 'Efficient Citation Screening by Weak Classifier Ensemble*'. Together they form a unique fingerprint.

Cite this