Abstract
Much effort has been made in the past decades to citation function classification, but noteworthy issues exist. Annotation difficulty resulted in limited data size, especially for minority classes, and inadequate representativeness of the underlying scientific domains. Concerning algorithmic classification, state-of-the-art deep learning-based methods are flawed by generating a feature vector for the whole citation context (or sentence) and failing to exploit the full realm of citation modelling options. Responding to these issues, this paper studied contextualised citation function classification. Specifically, a large new citation context dataset was created by merging and re-annotating six datasets about computational linguistics. A variety of strong SciBERT-based citation function classification models were proposed, and new states of the art were achieved. Through deeper performance analysis, this study focused on answering several research questions about the effective ways of performing citation function classification. More specifically, the study justified the necessity of modelling in-text citations in context and confirmed the superiority of doing citation function classification at citation (segment) level. A particular emphasis was placed on in-depth per-class performance analysis to understand whether citation function classification is robust enough to suit various popular downstream applications and what further efforts are required to meet such analytic needs. Finally, a naïve ensemble classifier was proposed, which greatly improved citation function classification performance.
Original language | English |
---|---|
Pages (from-to) | 5117-5158 |
Number of pages | 42 |
Journal | Scientometrics |
Volume | 128 |
Issue number | 9 |
Early online date | 12 Jul 2023 |
DOIs | |
Publication status | Published - Sept 2023 |
Bibliographical note
Copyright © and Moral Rights are retained by the author(s) and/ or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This item cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder(s). The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders.This document is the author’s post-print version, incorporating any revisions agreed during the peer-review process. Some differences between the published version and this version may remain and you are advised to consult the published version if you wish to cite from it.
Funder
The first author Xiaorui Jiang is partially supported by National Planning Office for Philosophy and Social Sciences of China (18ZDA238). Both authors have no competing interests to declare that are relevant to the content of this article.Funding
The first author Xiaorui Jiang is partially supported by National Planning Office for Philosophy and Social Sciences of China (18ZDA238). Both authors have no competing interests to declare that are relevant to the content of this article.
Funders | Funder number |
---|---|
National Office for Philosophy and Social Sciences | 18ZDA238 |
Keywords
- Citation context analysis
- Citation function classification
- Deep Learning
- SciBERT
- Ensemble
ASJC Scopus subject areas
- General Computer Science