Skip to content

ShreejayaB/TMDB-Box-Office-Predictions

Repository files navigation

TMDB Box Office Revenue Predictions

Description

This repo was made as part of project work done for Machine Learning Laboratory(MSDS 699) course at University of San Francisco's Master's in Data Science program.

The data we have chosen was taken from a Kaggle competition https://www.kaggle.com/c/tmdb-box-office-prediction/data

Contributors:

Goal

The goal of our project was to predict the revenue of movies at the box office.

Process

  1. Data processing which included missing value imputataion and encoding categorical features.
  2. Feature engineering - deriving meaningful features like age of the movie.
  3. Building pipeline to fit various machine learning models.
  4. Evaluating the models using relevant metrics and defining a North Star metric.
  5. Choosing the best model and visually inspecting our results.

Summary

For our problem statement we chose a baseline model as a linear model (Ridge Regression)
We fit the following models to our data:

  1. Ridge Regression
  2. KNeighboursRegressor
  3. BayesianRidge
  4. RandomForestRegressor
  5. XGBoost

Out of these models, we observed that RandomForest model performs the best -

  1. MedAE score(in million$) - 12.98 (North Star metric)
  2. R2 score - 0.85
  3. RMSLE - 1.68

Takeaways:

  1. 'Budget' of the movie is the most important predictor as per permutation feature importance, which makes a lot of sense with respect to the business implications.
  2. Difficult to accurately predict the movie’s box office performance because of various missing data points such as:
    • Overall economy at the time of the movie release
    • Quality of the movie’s plot and other exogenous factors
    • Presence of streaming service like Amazon Prime, Netflix etc.

In order to run our notebook and reproduce the results, the following steps can be followed:

1) Setup

Clone the repository using the given code:

git clone https://github.com/ShreejayaB/TMDB-Box-Office-Predictions

2) Creating Virtual Environment

  • Run the following command to create the virtual environment named 'tmdb_box_office_pred_ml':
conda env create -f tmdb_box_office_pred_venv.yml -n tmdb_box_office_pred_ml
  • Activate this virtual environment with the following command:
conda activate tmdb_box_office_pred_ml

3) Start IPython

Start the IPython notebook server from the root directory, with the jupyter notebook command.

About

TMDB Box Office Predictions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published