- Language : python3
- Source : tmdb 5000 database from kaggle, movielens data(ml-latest-small)
- Description : This is a movie recommender system. When you execute program, system asks you what movie you like and dislike and based on your preference, it recommends similar movies. For evaluation, actual users' ratings dataset is used. Using Pearson correlation coefficient, the missing parts of the rating data are filled.
- This is an indivisual project in Uppsala University's Information Retrieval Course.
- cosine similarity model : extract features, find N most similar movies, concatenate
- k-means clustering model : integrate feature information into one string, clustering
- Precision, Recall, F1 score
- Mean average precision
- Please download .csv files from
- https://www.kaggle.com/tmdb/tmdb-movie-metadata - > size: 9 MB
- https://grouplens.org/datasets/movielens/latest/ - > ml-latest-small.zip (size: 1 MB)
- Type commands below on terminal
python3 doc2vec_features.py
python3 train_kmeans.py
python3 make_dummy_eval.py
python3 fill_dummpy.py
- It takes quite long time, prepare a movie and watch it...
- Execute the main file
python3 sujoungs_recommender.py
simply execute the main file
python3 sujoungs_recommender.py
- tmdb_5000_movies.csv
- tmdb_5000_credits.csv
- ratings.csv
- links.csv
- numpy
- pandas
- json
- gensim
- scikit-learn
- nltk
- heapq
- yellowbrick