TY - JOUR
T1 - Disease discovery-based emotion lexicon
T2 - a heuristic approach to characterise sicknesses in microblogs
AU - Sarsam, Samer Muthana
AU - Al-Samarraie, Hosam
AU - Al-Sadi, Ahmed
PY - 2020/9/22
Y1 - 2020/9/22
N2 - The analysis of microblogging data has been widely used to discover valuable resources for timely identification of critical illness-related incidents and serious epidemics. Despite the numerous efforts made in this field, making an accurate and timely prediction of incidents and outbreaks based on certain clinical symptoms remains a great challenge. Hence, providing an investigative method can be crucial in characterising a disease state. This study proposes a heuristic mechanism by using an unsupervised learning technique to efficiently detect disease incidents and outbreaks from the tweet content. We categorised the types of emotions that are highly linked to a specific disease and its related terminologies. Emotions (anger, fear, sadness, and joy) and diabetes-related terminologies were extracted using the NRC Affect Intensity Lexicon and part-of-speech tagging tool. A two-cluster solution was established and validated. The classification results showed that K-means clustering with two centroids had the highest classification accuracy (96.53%). The relationship between diabetes-related terms (in the form of tweets) and emotions were established and assessed using the association rules mining technique. The results showed that diabetes-related terms were exclusively associated with fear emotions. This study offers a novel mechanism for disease recognition and outbreak detection in microblogs that can be useful in making informed decisions about a disease state.
AB - The analysis of microblogging data has been widely used to discover valuable resources for timely identification of critical illness-related incidents and serious epidemics. Despite the numerous efforts made in this field, making an accurate and timely prediction of incidents and outbreaks based on certain clinical symptoms remains a great challenge. Hence, providing an investigative method can be crucial in characterising a disease state. This study proposes a heuristic mechanism by using an unsupervised learning technique to efficiently detect disease incidents and outbreaks from the tweet content. We categorised the types of emotions that are highly linked to a specific disease and its related terminologies. Emotions (anger, fear, sadness, and joy) and diabetes-related terminologies were extracted using the NRC Affect Intensity Lexicon and part-of-speech tagging tool. A two-cluster solution was established and validated. The classification results showed that K-means clustering with two centroids had the highest classification accuracy (96.53%). The relationship between diabetes-related terms (in the form of tweets) and emotions were established and assessed using the association rules mining technique. The results showed that diabetes-related terms were exclusively associated with fear emotions. This study offers a novel mechanism for disease recognition and outbreak detection in microblogs that can be useful in making informed decisions about a disease state.
KW - Association rules mining
KW - Diabetes
KW - Disease detection
KW - Emotion lexicon
KW - Part-of-speech tagging
KW - Twitter
UR - http://www.scopus.com/inward/record.url?scp=85091295649&partnerID=8YFLogxK
U2 - 10.1007/s13721-020-00271-6
DO - 10.1007/s13721-020-00271-6
M3 - Article
AN - SCOPUS:85091295649
SN - 2192-6662
VL - 9
JO - Network Modeling Analysis in Health Informatics and Bioinformatics
JF - Network Modeling Analysis in Health Informatics and Bioinformatics
IS - 1
M1 - 65
ER -