Skip to content

NLP Challenge: IMDB Dataset of 50K Movie Reviews to perform Sentiment analysis

Notifications You must be signed in to change notification settings

yahya010/NLP_CHALLENGE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NLP Sentiment Analysis Challenge

Overview

This repository is dedicated to the Sentiment Analysis challenge on the IMDB Dataset of 50K Movie Reviews. The objective is to apply Natural Language Processing (NLP) techniques to accurately determine the sentiment of movie reviews.

Classifiers

In this project, we explore the effectiveness of various machine learning models for NLP tasks. The classifiers include:

  • Random Forest
  • K-Nearest Neighbors (K-NN)
  • Multinomial Naive Bayes
  • TF-IDF Vectorization as a feature extraction method
  • BERT (Bidirectional Encoder Representations from Transformers) as a state-of-the-art language model

Dataset

The dataset used in this challenge consists of 50,000 movie reviews from the IMDB database. Each review is labeled as positive or negative, providing a binary classification target for sentiment analysis.

Repository Structure

  • data/: Directory containing the IMDB dataset and any additional data files used in the analyses.
  • notebooks/: Jupyter notebooks with detailed analyses and model training steps.
  • models/: Serialized versions of the trained models ready for inference.
  • reports/: Generated reports and visualizations that summarize the findings and model performances.

Results

The models are evaluated based on accuracy, precision, recall, and F1-score to ensure a comprehensive understanding of their performance. Detailed results and discussions are presented within the Jupyter notebooks in the notebooks/ directory.

Contributing

Contributions to the NLP Sentiment Analysis Challenge are welcome! Please refer to CONTRIBUTING.md for guidelines on how to contribute to this project.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For any queries or discussions regarding the project, please open an issue in this repository.


Happy coding!

About

NLP Challenge: IMDB Dataset of 50K Movie Reviews to perform Sentiment analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published