Titre : | Machine Learning and Deep Learning for Moroccan Dialect Sentiment Analysis | Type de document : | projet fin études | Auteurs : | El Mahdi Mercha, Auteur | Langues : | Français (fre) | Catégories : | BIG DATA
| Index. décimale : | mast 256/19 | Résumé : | Sentiment analysis or opinion mining is a research area of natural language processing,
that focus on the study of people's opinions, sentiments, emotions and attitudes
about several entities such as products, services, issues, events and their attributes.
In recent years, there exist a lot of research in the eld of sentiment analysis through
exploiting messages shared in the social networks and written in English or in French
etc. However, few studies have been carried out for sentiment analysis based on Arabic
dialectal languages especially the Moroccan one.
With frequent transcriptions switching between MSA, Latin and alphanumeric transcription
by its writers, the processing of this dialect becomes even more complicated.
YouTube is commonly used in Arab countries and specically in Morocco, we are
interested, in this research, in the sentiment analysis of Moroccan user's comments
on several YouTube videos.
In this master thesis, we will explore the eld of Moroccan dialect sentiment analysis.
First, we will describe the steps pursued to construct the Moroccan Dialect Sentiment
Analysis Corpus (MDSAC). Second, we will carry out several sets of experiments to
explore the impact of the dierent (i) settings for text representation, and (ii) Machine
learning algorithms used. Two folds of studies have been conducted. The
performance of classical M.L algorithms, namely Support Vector Machines (SVM),
Naïve Bayes (NB) and Logistic regression (LR), in parralel with dierent settings
for text representation, such as stemming type, indexing and weighting schemes, has
been investigated. The second fold concerns the usage of deep learning for both text
representation and classication. Two word embedding models were built, one using
Continuous Bag of Words (CBoW), and the other using Skip-gram. The dierent
architectures of deep learning ( CNN and LSTM) using the two dierent learned
word representation were compared. For both folds, a large set of experiments was
conducted to tune the dierent hyperparameters. |
Machine Learning and Deep Learning for Moroccan Dialect Sentiment Analysis [projet fin études] / El Mahdi Mercha, Auteur . - [s.d.]. Langues : Français ( fre) Catégories : | BIG DATA
| Index. décimale : | mast 256/19 | Résumé : | Sentiment analysis or opinion mining is a research area of natural language processing,
that focus on the study of people's opinions, sentiments, emotions and attitudes
about several entities such as products, services, issues, events and their attributes.
In recent years, there exist a lot of research in the eld of sentiment analysis through
exploiting messages shared in the social networks and written in English or in French
etc. However, few studies have been carried out for sentiment analysis based on Arabic
dialectal languages especially the Moroccan one.
With frequent transcriptions switching between MSA, Latin and alphanumeric transcription
by its writers, the processing of this dialect becomes even more complicated.
YouTube is commonly used in Arab countries and specically in Morocco, we are
interested, in this research, in the sentiment analysis of Moroccan user's comments
on several YouTube videos.
In this master thesis, we will explore the eld of Moroccan dialect sentiment analysis.
First, we will describe the steps pursued to construct the Moroccan Dialect Sentiment
Analysis Corpus (MDSAC). Second, we will carry out several sets of experiments to
explore the impact of the dierent (i) settings for text representation, and (ii) Machine
learning algorithms used. Two folds of studies have been conducted. The
performance of classical M.L algorithms, namely Support Vector Machines (SVM),
Naïve Bayes (NB) and Logistic regression (LR), in parralel with dierent settings
for text representation, such as stemming type, indexing and weighting schemes, has
been investigated. The second fold concerns the usage of deep learning for both text
representation and classication. Two word embedding models were built, one using
Continuous Bag of Words (CBoW), and the other using Skip-gram. The dierent
architectures of deep learning ( CNN and LSTM) using the two dierent learned
word representation were compared. For both folds, a large set of experiments was
conducted to tune the dierent hyperparameters. |
|