Abstract
The social media network phenomenon creates massive amounts of valuable data that is available online and easy to access. Many users share images, videos, comments, reviews, news and opinions on different social networks sites, with Twitter being one of the most popular ones. Data collected from Twitter is highly unstructured, and extracting useful information from tweets is a challenging task. Twitter has a huge number of Arabic users who mostly post and write their tweets using the Arabic language. While there has been a lot of research on sentiment analysis in English, the amount of researches and datasets in Arabic language is limited.
This paper introduces an Arabic language dataset, which is about opinions on health services and has been collected from Twitter. The paper will first detail the process of collecting the data from Twitter and also the process of filtering, pre-processing and annotating the Arabic text in order to build a big sentiment analysis dataset in Arabic. Several Machine Learning algorithms (Naïve Bayes, Support Vector Machine and Logistic Regression) alongside Deep and Convolutional Neural Networks were utilized in our experiments of sentiment analysis on our health dataset.
Publisher Statement: © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
This paper introduces an Arabic language dataset, which is about opinions on health services and has been collected from Twitter. The paper will first detail the process of collecting the data from Twitter and also the process of filtering, pre-processing and annotating the Arabic text in order to build a big sentiment analysis dataset in Arabic. Several Machine Learning algorithms (Naïve Bayes, Support Vector Machine and Logistic Regression) alongside Deep and Convolutional Neural Networks were utilized in our experiments of sentiment analysis on our health dataset.
Publisher Statement: © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Original language | English |
---|---|
Title of host publication | 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR) |
Publisher | IEEE Computer Society |
Pages | 114-118 |
Number of pages | 5 |
ISBN (Electronic) | 978-1-5090-6628-5 |
ISBN (Print) | 978-1-5090-6629-2 |
DOIs | |
Publication status | Published - 16 Oct 2017 |
Event | International Workshop on Arabic and derived Script Analysis and Recognition - University Lorraine, Nancy, France Duration: 3 Apr 2017 → 5 Apr 2017 Conference number: 1 http://www.teklia.com/?p=220 |
Workshop
Workshop | International Workshop on Arabic and derived Script Analysis and Recognition |
---|---|
Abbreviated title | ASAR |
Country/Territory | France |
City | Nancy |
Period | 3/04/17 → 5/04/17 |
Internet address |
Keywords
- Sentiment analysis
- Filtering
- Feature extraction
- Support vector machines
- Semantics
- Neural networks
- Bayes methods
- Big Data
- data mining
- learning (artificial intelligence)
- neural nets
- pattern classification
- regression analysis
- sentiment analysis
- social networking (online)
- support vector machines
- Arabic language sentiment analysis
- health services
- social media network phenomenon
- tweets
- Arabic language dataset
- big sentiment analysis dataset
- Convolutional Neural Networks
- health dataset
- Arabic text filtering
- Arabic text annotation
- Arabic text preprocessing
- machine learning algorithms
- naïve Bayes algorithm
- support vector machine algorithm
- logistic regression algorithm
- deep neural network
- Sentiment Analysis
- Machine Learning
- Deep Neural NEtworks
- Arabic Language