Skip to content

Latest commit

 

History

History
67 lines (46 loc) · 2.84 KB

README.md

File metadata and controls

67 lines (46 loc) · 2.84 KB

Text Emotion Classification

image

This project classifies text into one of four emotions: Joy, Sadness, Anger, or Fear. The system uses a Recurrent Neural Network (RNN) model for high accuracy and integrates with a Streamlit application for real-time emotion prediction.

Project Overview

1. Data Preparation

  • Data Collection:

    • The dataset used is collected from Kaggle. You can find it here.
  • Data Cleaning and Filtering:

    • Removed duplicate rows.
  • Text Preprocessing:

    • Cleaned texts by removing non-alphabetic characters, converting to lowercase, removing stopwords, and applying stemming.
  • Feature Extraction:

    • Trained a Word2Vec model to convert text into vector representations.
    • Computed average word vectors for each text sample.
  • Dataset Splitting:

    • Split data into training and validation sets.
    • Encoded labels for classification.

2. Model Training

  • Gradient Boosting Models:

    • Used Optuna for hyperparameter tuning of LGBMClassifier and XGBClassifier.
    • Evaluated cross-validation scores to select the best model.
  • RNN Model:

    • Built a Sequential model with an Embedding layer, SimpleRNN layer, and Dense output layer.
    • Achieved an accuracy of approximately 89% with this RNN model, which significantly outperforms traditional machine learning models that achieve 61-64% accuracy.
  • Gated RNN Model:

    • Enhanced the model by replacing the SimpleRNN layer with a GRU (Gated Recurrent Unit) layer.
    • This improvement boosted the accuracy to 93.5%, demonstrating a clear enhancement over the standard RNN.
  • Model and Tokenizer Saving:

    • Saved the trained GRU model, tokenizer, and label encoder for future use.

3. Streamlit Application

A Streamlit application is developed to allow users to input text and receive emotion predictions in real-time. For more details on running the Streamlit application, refer to the following instructions:

  1. Install Dependencies:

    • Ensure you have the required libraries installed:
      pip install streamlit pandas numpy pickle5 nltk keras tensorflow
  2. Run the Streamlit App:

    • Navigate to the directory containing your Streamlit script and run:
      streamlit run your_script_name.py
  3. Interact with the App:

    • Open the provided URL in your browser to use the text classification app.

Additional Notes

  • Ensure that the maxlen parameter in the Streamlit app matches the value used during model training.
  • The model and tokenizer files (emotion_classification_rnn.h5, tokenizer.pkl, and label_encoder.pkl) should be in the same directory as the Streamlit app or provide the correct path.