This project is a part of the Neural Networks and Deep Learning Course. The project is about the implementation of the Transformer model and its variants. The project is divided into two parts.
The first part is about the application of the Transformer model in the field of Speech Emotion Recognition (SER). We are going to use the HuBERT model, which is a variant of the Transformer model, to classify the ShEMO dataset.
This dataset includes 3000 semi-natural utterances, equivalent to 3 h and 25 min of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness, and surprise, as well as a neutral state along with the orthographic and phonetic transcriptions (according to the International Phonetic Alphabet).
The second part is about the application of the Transformer model in the field of Natural Language Processing (NLP). We are going to do natural language inference (NLI) on the Persian language using the ParsBERT model, on the FarsTail dataset which is a Persian NLI dataset.
This project is licensed under the MIT License - see the LICENSE file for details.