The Canonical Model of Structure for Data Extraction in Systematic Reviews of Scientific Research Articles

Bello Aliyu Muhammad, Rahat Iqbal, Anne James

Research output: Chapter in Book/Report/Conference proceedingConference proceeding

Abstract

The systematic review activity is time-consuming, error prone and labour intensive activity due to the manual processes involved; with data extraction being an extremely difficult and cognitively demanding process. Automation can save a significant amount of time and reduces the workload. However, there is no unified approach for automatic data extraction in systematic reviews. This paper presents a canonical model of structure of the papers that serves as a unified approach and a foundation for subsequent extraction of information from scientific research articles automatically. The model was developed using text mining and natural language processing techniques on one thousand (1000) published research papers. A novel approach was used to identify the various section headings from the papers. This approach achieved an accuracy of 82%. A statistical analysis of the most frequent words/phrases in the section headings was used to build the canonical model of structure of the papers.
Original languageEnglish
Title of host publication2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS)
PublisherIEEE Computer Society
Pages264 - 271
Number of pages8
ISBN (Electronic)978-1-5386-9588-3
ISBN (Print)978-1-5386-9589-0
DOIs
Publication statusPublished - 3 Dec 2018
EventFifth International Conference on Social Networks Analysis, Management and Security - Valencia, Spain
Duration: 15 Oct 201818 Oct 2018
Conference number: 5th
http://emergingtechnet.org/SNAMS2018/

Conference

ConferenceFifth International Conference on Social Networks Analysis, Management and Security
Abbreviated titleSNAMS 2018
CountrySpain
CityValencia
Period15/10/1818/10/18
Internet address

Fingerprint

Statistical methods
Automation
Personnel
Processing

Keywords

  • Data extraction
  • Systematic review
  • canonical structure
  • text mining and natural language processing

Cite this

Muhammad, B. A., Iqbal, R., & James, A. (2018). The Canonical Model of Structure for Data Extraction in Systematic Reviews of Scientific Research Articles. In 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS) (pp. 264 - 271). IEEE Computer Society. https://doi.org/10.1109/SNAMS.2018.8554896

The Canonical Model of Structure for Data Extraction in Systematic Reviews of Scientific Research Articles. / Muhammad, Bello Aliyu; Iqbal, Rahat; James, Anne.

2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS) . IEEE Computer Society, 2018. p. 264 - 271.

Research output: Chapter in Book/Report/Conference proceedingConference proceeding

Muhammad, BA, Iqbal, R & James, A 2018, The Canonical Model of Structure for Data Extraction in Systematic Reviews of Scientific Research Articles. in 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS) . IEEE Computer Society, pp. 264 - 271, Fifth International Conference on Social Networks Analysis, Management and Security, Valencia, Spain, 15/10/18. https://doi.org/10.1109/SNAMS.2018.8554896
Muhammad BA, Iqbal R, James A. The Canonical Model of Structure for Data Extraction in Systematic Reviews of Scientific Research Articles. In 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS) . IEEE Computer Society. 2018. p. 264 - 271 https://doi.org/10.1109/SNAMS.2018.8554896
Muhammad, Bello Aliyu ; Iqbal, Rahat ; James, Anne. / The Canonical Model of Structure for Data Extraction in Systematic Reviews of Scientific Research Articles. 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS) . IEEE Computer Society, 2018. pp. 264 - 271
@inproceedings{b10f31e61ed24417b7f91c6e1e8dfce4,
title = "The Canonical Model of Structure for Data Extraction in Systematic Reviews of Scientific Research Articles",
abstract = "The systematic review activity is time-consuming, error prone and labour intensive activity due to the manual processes involved; with data extraction being an extremely difficult and cognitively demanding process. Automation can save a significant amount of time and reduces the workload. However, there is no unified approach for automatic data extraction in systematic reviews. This paper presents a canonical model of structure of the papers that serves as a unified approach and a foundation for subsequent extraction of information from scientific research articles automatically. The model was developed using text mining and natural language processing techniques on one thousand (1000) published research papers. A novel approach was used to identify the various section headings from the papers. This approach achieved an accuracy of 82{\%}. A statistical analysis of the most frequent words/phrases in the section headings was used to build the canonical model of structure of the papers.",
keywords = "Data extraction, Systematic review, canonical structure, text mining and natural language processing",
author = "Muhammad, {Bello Aliyu} and Rahat Iqbal and Anne James",
year = "2018",
month = "12",
day = "3",
doi = "10.1109/SNAMS.2018.8554896",
language = "English",
isbn = "978-1-5386-9589-0",
pages = "264 -- 271",
booktitle = "2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS)",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - The Canonical Model of Structure for Data Extraction in Systematic Reviews of Scientific Research Articles

AU - Muhammad, Bello Aliyu

AU - Iqbal, Rahat

AU - James, Anne

PY - 2018/12/3

Y1 - 2018/12/3

N2 - The systematic review activity is time-consuming, error prone and labour intensive activity due to the manual processes involved; with data extraction being an extremely difficult and cognitively demanding process. Automation can save a significant amount of time and reduces the workload. However, there is no unified approach for automatic data extraction in systematic reviews. This paper presents a canonical model of structure of the papers that serves as a unified approach and a foundation for subsequent extraction of information from scientific research articles automatically. The model was developed using text mining and natural language processing techniques on one thousand (1000) published research papers. A novel approach was used to identify the various section headings from the papers. This approach achieved an accuracy of 82%. A statistical analysis of the most frequent words/phrases in the section headings was used to build the canonical model of structure of the papers.

AB - The systematic review activity is time-consuming, error prone and labour intensive activity due to the manual processes involved; with data extraction being an extremely difficult and cognitively demanding process. Automation can save a significant amount of time and reduces the workload. However, there is no unified approach for automatic data extraction in systematic reviews. This paper presents a canonical model of structure of the papers that serves as a unified approach and a foundation for subsequent extraction of information from scientific research articles automatically. The model was developed using text mining and natural language processing techniques on one thousand (1000) published research papers. A novel approach was used to identify the various section headings from the papers. This approach achieved an accuracy of 82%. A statistical analysis of the most frequent words/phrases in the section headings was used to build the canonical model of structure of the papers.

KW - Data extraction

KW - Systematic review

KW - canonical structure

KW - text mining and natural language processing

U2 - 10.1109/SNAMS.2018.8554896

DO - 10.1109/SNAMS.2018.8554896

M3 - Conference proceeding

SN - 978-1-5386-9589-0

SP - 264

EP - 271

BT - 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS)

PB - IEEE Computer Society

ER -