Abstract
In the past decade, people have directed their interests toward using different social media platforms. These are where they share their experiences and life activities with other people. Twitter is one of the popular examples of social media, where users can write or post short messages. The data in Twitter is easily accessible and, thus, it is a good source to be analysed. Sentiment analysis is the task of extracting and classifying the opinions or the feeling from text.This thesis first develops an Arabic Health Services (AHS) dataset for sentiment classification purposes. The dataset has been collected from Twitter, filtered, and annotated manually by three people. Then, three word embedding techniques which are Word2vec, GloVe, and fastText, were trained using two different Arabic corpora. These models obtain vectors to be used as input for the sentiment classification. This thesis also studies the effectiveness of different features for classifying Arabic text. The effectiveness of using word embedding models for sentiment analysis for short Arabic text was also investigated in this thesis.
Different sentiment analysis levels were proposed in order to deal with the complexities of the Arabic language in morphology and orthography. Additionally, several deep neural network models and machine learning classifiers were trained to conduct a sentiment analysis on the newly developed AHS dataset. In particular, a model that combines a Convolutional Neural Network combined with Long Short Term Memory has been used to analyse an Arabic dataset for the first time. The purpose CNN and LSTM model achieved good sentiment classification performance.
Date of Award | Apr 2019 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Vasile Palade (Supervisor), Matthew England (Supervisor) & Rahat Iqbal (Supervisor) |