Skip to main navigation Skip to search Skip to main content

Large language model-based multiagent collaboration for abstract screening toward automated systematic reviews

  • University of Sheffield

Research output: Contribution to journalArticlepeer-review

30 Downloads (Pure)

Abstract

Systematic reviews (SRs) are essential for evidence-based practice but remain labor-intensive, especially during abstract screening. This study evaluates whether multiple large language models (multi-LLMs) collaboration can improve the efficiency and reduce costs for abstract screening. Abstract screening was framed as a question-answering (QA) task using cost-effective LLMs. Three multi-LLM collaboration strategies were evaluated, including majority voting by averaging opinions of peers, multi-agent debate for answer refinement, and LLM-based adjudication against answers of individual QA baselines. These strategies were evaluated on 28 SRs of the CLEF eHealth 2019 technology-assisted review benchmark using standard performance metrics such as mean average precision (MAP) and work saved over sampling at 95% recall (work saved over sampling WSS@95%). Multi-LLM collaboration significantly outperformed QA baselines. Majority voting was overall the best strategy, achieving the highest MAP 0.462 and 0.341 on subsets of SRs about clinical intervention and diagnostic technology assessment, respectively, with WSS@95% 0.606 and 0.680, enabling in theory up to 68% workload reduction at 95% recall of all relevant studies. Multi-agent debate improved weaker models most. Our own adjudicator-as-a-ranker method was the second strongest approach, surpassing adjudicator-as-a-judge, but at a significantly higher cost than majority voting and debating. Multi-LLM collaboration substantially improves abstract screening efficiency, and the success lies in model diversity. Making the best use of diversity, majority voting stands out in terms of both excellent performance and low cost compared to adjudication. Despite context-dependent gains and diminishing model diversity, multi-agent debate is still a cost-effective strategy and a potential direction of further research.
Original languageEnglish
Pages (from-to)(In-Press)
Number of pages42
JournalBiology Methods & Protocols
Volume11
Issue number1
Early online date4 Feb 2026
DOIs
Publication statusPublished - 2 Mar 2026

Bibliographical note

Accepted manuscripts are PDF versions of the author’s final manuscript, as accepted for publication by the journal but prior to copyediting or typesetting. They can be cited using the author(s), article title, journal title, year of online publication, and DOI. They will be replaced by the final typeset articles, which may therefore contain changes. The DOI will remain the same throughout.

Keywords

  • abstract screening
  • ensemble
  • large language model
  • multiagent system
  • systematic review

ASJC Scopus subject areas

  • General Biochemistry,Genetics and Molecular Biology
  • General Agricultural and Biological Sciences

Fingerprint

Dive into the research topics of 'Large language model-based multiagent collaboration for abstract screening toward automated systematic reviews'. Together they form a unique fingerprint.

Cite this