This project classifies text into one of four emotions: Joy, Sadness, Anger, or Fear. The system uses a Recurrent Neural Network (RNN) model for high accuracy and integrates with a Streamlit application for real-time emotion prediction.
-
Data Collection:
- The dataset used is collected from Kaggle. You can find it here.
-
Data Cleaning and Filtering:
- Removed duplicate rows.
-
Text Preprocessing:
- Cleaned texts by removing non-alphabetic characters, converting to lowercase, removing stopwords, and applying stemming.
-
Feature Extraction:
- Trained a Word2Vec model to convert text into vector representations.
- Computed average word vectors for each text sample.
-
Dataset Splitting:
- Split data into training and validation sets.
- Encoded labels for classification.
-
Gradient Boosting Models:
- Used Optuna for hyperparameter tuning of
LGBMClassifier
andXGBClassifier
. - Evaluated cross-validation scores to select the best model.
- Used Optuna for hyperparameter tuning of
-
RNN Model:
- Built a Sequential model with an Embedding layer, SimpleRNN layer, and Dense output layer.
- Achieved an accuracy of approximately 89% with this RNN model, which significantly outperforms traditional machine learning models that achieve 61-64% accuracy.
-
Gated RNN Model:
- Enhanced the model by replacing the SimpleRNN layer with a
GRU
(Gated Recurrent Unit) layer. - This improvement boosted the accuracy to 93.5%, demonstrating a clear enhancement over the standard RNN.
- Enhanced the model by replacing the SimpleRNN layer with a
-
Model and Tokenizer Saving:
- Saved the trained GRU model, tokenizer, and label encoder for future use.
A Streamlit application is developed to allow users to input text and receive emotion predictions in real-time. For more details on running the Streamlit application, refer to the following instructions:
-
Install Dependencies:
- Ensure you have the required libraries installed:
pip install streamlit pandas numpy pickle5 nltk keras tensorflow
- Ensure you have the required libraries installed:
-
Run the Streamlit App:
- Navigate to the directory containing your Streamlit script and run:
streamlit run your_script_name.py
- Navigate to the directory containing your Streamlit script and run:
-
Interact with the App:
- Open the provided URL in your browser to use the text classification app.
- Ensure that the
maxlen
parameter in the Streamlit app matches the value used during model training. - The model and tokenizer files (
emotion_classification_rnn.h5
,tokenizer.pkl
, andlabel_encoder.pkl
) should be in the same directory as the Streamlit app or provide the correct path.