This repository is dedicated to the Sentiment Analysis challenge on the IMDB Dataset of 50K Movie Reviews. The objective is to apply Natural Language Processing (NLP) techniques to accurately determine the sentiment of movie reviews.
In this project, we explore the effectiveness of various machine learning models for NLP tasks. The classifiers include:
- Random Forest
- K-Nearest Neighbors (K-NN)
- Multinomial Naive Bayes
- TF-IDF Vectorization as a feature extraction method
- BERT (Bidirectional Encoder Representations from Transformers) as a state-of-the-art language model
The dataset used in this challenge consists of 50,000 movie reviews from the IMDB database. Each review is labeled as positive or negative, providing a binary classification target for sentiment analysis.
data/
: Directory containing the IMDB dataset and any additional data files used in the analyses.notebooks/
: Jupyter notebooks with detailed analyses and model training steps.models/
: Serialized versions of the trained models ready for inference.reports/
: Generated reports and visualizations that summarize the findings and model performances.
The models are evaluated based on accuracy, precision, recall, and F1-score to ensure a comprehensive understanding of their performance. Detailed results and discussions are presented within the Jupyter notebooks in the notebooks/
directory.
Contributions to the NLP Sentiment Analysis Challenge are welcome! Please refer to CONTRIBUTING.md
for guidelines on how to contribute to this project.
This project is licensed under the MIT License - see the LICENSE file for details.
For any queries or discussions regarding the project, please open an issue in this repository.
Happy coding!