Abstract
Much effort has been made in the past decades to citation function classification. Noteworthy issues exist. Annotation difficulty made existing datasets quite limited in size, especially for minority classes, and quite limited in the representativeness of a scientific domain. Different annotation schemes made existing studies not easily mappable and comparable. Concerning algorithmic classification, state-of-the-art deep learning-based methods are flawed by generating a feature vector for the whole citation context (or sentence) and failing to exploit the full realm of citation modelling options. Responding to these issues, this paper studied contextualised citation function classification. Specifically, a large new citation context dataset was created by merging and re-annotating six datasets about computational linguistics. A variety of strong SciBERT-based citation function classification models were proposed. In addition to achieving the new state of the art of citation function classification, this study focused on deeper performance analysis of to answer several research questions about the effective ways of performing citation function classification, more specifically, the necessity of modelling in-text citations in context and doing citation function classification at citation (segment) level. A particular emphasis was placed on in-depth per-class performance analysis for the purpose of understanding whether citation function classification is robust enough for scientometric applications, what implications can be derived for the applicability of citation function classification to different scientometric analysis tasks, and what further efforts are required to meet such analytic needs.
Original language | English |
---|---|
Number of pages | 38 |
Journal | Scientometrics |
Publication status | Submitted - 2022 |
Keywords
- Citation context analysis
- citation function classification
- deep learning
- SciBERT
ASJC Scopus subject areas
- Computer Science(all)