Sure, here's a suggested README for your GitHub repository containing the notebooks for your Recommender System project:
This repository contains Jupyter notebooks detailing the creation and analysis of a recommender system, showcasing a complete workflow from exploratory data analysis (EDA) to modeling. Developed as part of my applied learning at the University of Chicago's MS in Applied Data Science program, this project reflects my journey in harnessing data science to build a practical and impactful tool.
The project is divided into two main notebooks:
-
Recommender_System_End_to_End_(EDA).ipynb: This notebook focuses on the initial data exploration, providing a comprehensive understanding of the dataset's characteristics and the underlying patterns.
-
Recommender_System_End_to_End_(Modeling).ipynb: Here, the insights gained from the EDA are put into action. The notebook details the process of building a recommender system, including data preprocessing, model selection, and evaluation.
The project utilizes a dataset consisting of books, ratings, and user information. The key challenge addressed is to recommend books to users based on their preferences and historical data.
-
Exploratory Data Analysis (EDA): This step involves an in-depth analysis of the datasets to understand the distributions, correlations, and potential biases in the data. It sets the stage for informed model building.
-
Modeling: Various techniques are employed to build the recommender system, including collaborative filtering and content-based filtering. The modeling process is iterative and focuses on improving recommendation accuracy.
- Comprehensive data cleaning and preprocessing
- In-depth exploratory analysis with visualizations
- Implementation of collaborative filtering techniques
- Calculation of cosine similarity scores for recommendations
- Evaluation of model performance with appropriate metrics
To use the notebooks:
- Clone the repository.
- Ensure you have Jupyter Notebook installed.
- Open the notebooks in Jupyter to view, edit, or run the cells.
- For EDA notebook, use the original datasets, the outputs from this notebook are saved as cleaned datasets
- Using the cleaned datasets, run the Modeling notebook.
Contributions, issues, and feature requests are welcome. Feel free to check issues page if you want to contribute.
- Kishor Mannur - MS in Applied Data Science at UChicago | Ex-Big Data Engineer at Wipro