Contextualised Modelling for Effective Citation Function Classification

Xiaorui Jiang, Chaoxiang Cai, Wenwen Fan, Tong Liu, Jingqiang Chen

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

1 Citation (Scopus)

Abstract

Citation function classification is an important task in scientific text mining. The past two decades have witnessed many computerised algorithms working on various citation function datasets tailored to various annotation schemes. Recently, deep learning has pushed the state of the art by a large margin. Several pitfalls exist. Due to annotation difficulty, data sizes, especially the minority classes, are often not big enough for training effective deep learning models. Being less discussed, most state-of-the-art deep learning solutions in fact generate a feature representation for the citation sentence or context, instead of modelling individual in-text citations. This is conceptually flawed as it is common to see multiple in-text citations with different functions in the same citation sentence. In addition, existing deep learning studies have only explored a rather limited design space of encoding citation and its surrounding context. This paper explored a wide range of modelling options based on SciBERT, the popular cross-disciplinary pre-trained scientific language model, and their performances on citation function classification, for the purpose of determining the most effective way of modeling citation and its context. To deal with the data size issue, we created a large-scale citation function dataset by mapping, merging and re-annotating six publicly available datasets from the computational linguistics domain by adapting Teufel et al.'s 12-class scheme. The best F1 scores we achieved were around 66.16%, 71.39% and 73.56% on a 11-class annotation scheme slightly adapted from Teufel et al.'s 12-class scheme, a reduced 7-class scheme by merging comparison functions, and Jurgens et al.'s 6-class scheme respectively. A useful observation is that there is no single best model that is superior for all functions, therefore the trained model variants allow for applications which emphasise on a specific type of or a specific group of citation functions.

Original languageEnglish
Title of host publicationProceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval
PublisherACM
Pages93-103
Number of pages11
ISBN (Electronic)9781450397629
DOIs
Publication statusPublished - 16 Dec 2022
Event2022 6th International Conference on Natural Language Processing and Information Retrieval - Bangkok, Thailand
Duration: 16 Dec 202218 Dec 2022
http://www.nlpir.net/2022.html

Conference

Conference2022 6th International Conference on Natural Language Processing and Information Retrieval
Abbreviated titleNLPIR 2022
Country/TerritoryThailand
CityBangkok
Period16/12/2218/12/22
Internet address

Funding

FundersFunder number
National Office for Philosophy and Social Sciences18ZDA238

    Keywords

    • Citation function classification
    • SciBERT
    • citation context analysis
    • citation intent identification
    • deep learning

    ASJC Scopus subject areas

    • Software
    • Human-Computer Interaction
    • Computer Vision and Pattern Recognition
    • Computer Networks and Communications

    Fingerprint

    Dive into the research topics of 'Contextualised Modelling for Effective Citation Function Classification'. Together they form a unique fingerprint.

    Cite this