Skip to content

Latest commit

 

History

History
9 lines (9 loc) · 842 Bytes

File metadata and controls

9 lines (9 loc) · 842 Bytes

Movie-Recommendation-on-IMDB-Dataset

The dataset is IMDB top 250 English movies, it can be downloaded from: https://data.world/studentoflife/imdb-top-250-lists-and-5000-or-so-data-records.
In this dataset there are 250 movies (rows) and 38 attributes (columns).
I have used Rapid Automatic Keyword Extraction (RAKE) library, it is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.
This project is Content-based Recommender Using Natural Language Processing (NLP).
Strategy:
Count Vectorizer + Cosine Similarity

  • Count Vectorizer : for converting sentences into vectors
  • Cosine Similarity : calculates similarity by measuring the cosine of angle between two vectors.