Abstract
The systematic review activity is time-consuming, error prone and labour intensive activity due to the manual processes involved; with data extraction being an extremely difficult and cognitively demanding process. Automation can save a significant amount of time and reduces the workload. However, there is no unified approach for automatic data extraction in systematic reviews. This paper presents a canonical model of structure of the papers that serves as a unified approach and a foundation for subsequent extraction of information from scientific research articles automatically. The model was developed using text mining and natural language processing techniques on one thousand (1000) published research papers. A novel approach was used to identify the various section headings from the papers. This approach achieved an accuracy of 82%. A statistical analysis of the most frequent words/phrases in the section headings was used to build the canonical model of structure of the papers.
Original language | English |
---|---|
Title of host publication | 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS) |
Publisher | IEEE Computer Society |
Pages | 264 - 271 |
Number of pages | 8 |
ISBN (Electronic) | 978-1-5386-9588-3 |
ISBN (Print) | 978-1-5386-9589-0 |
DOIs | |
Publication status | Published - 3 Dec 2018 |
Event | Fifth International Conference on Social Networks Analysis, Management and Security - Valencia, Spain Duration: 15 Oct 2018 → 18 Oct 2018 Conference number: 5th http://emergingtechnet.org/SNAMS2018/ |
Conference
Conference | Fifth International Conference on Social Networks Analysis, Management and Security |
---|---|
Abbreviated title | SNAMS 2018 |
Country/Territory | Spain |
City | Valencia |
Period | 15/10/18 → 18/10/18 |
Internet address |
Keywords
- Data extraction
- Systematic review
- canonical structure
- text mining and natural language processing