TMDB Box Office Revenue Predictions

Description

This repo was made as part of project work done for Machine Learning Laboratory(MSDS 699) course at University of San Francisco's Master's in Data Science program.

The data we have chosen was taken from a Kaggle competition https://www.kaggle.com/c/tmdb-box-office-prediction/data

Contributors:

Goal

The goal of our project was to predict the revenue of movies at the box office.

Process

Data processing which included missing value imputataion and encoding categorical features.
Feature engineering - deriving meaningful features like age of the movie.
Building pipeline to fit various machine learning models.
Evaluating the models using relevant metrics and defining a North Star metric.
Choosing the best model and visually inspecting our results.

Summary

For our problem statement we chose a baseline model as a linear model (Ridge Regression)
We fit the following models to our data:

Ridge Regression
KNeighboursRegressor
BayesianRidge
RandomForestRegressor
XGBoost

Out of these models, we observed that RandomForest model performs the best -

MedAE score(in million$) - 12.98 (North Star metric)
R2 score - 0.85
RMSLE - 1.68

Takeaways:

'Budget' of the movie is the most important predictor as per permutation feature importance, which makes a lot of sense with respect to the business implications.
Difficult to accurately predict the movie’s box office performance because of various missing data points such as:
- Overall economy at the time of the movie release
- Quality of the movie’s plot and other exogenous factors
- Presence of streaming service like Amazon Prime, Netflix etc.

In order to run our notebook and reproduce the results, the following steps can be followed:

1) Setup

Clone the repository using the given code:

git clone https://github.com/ShreejayaB/TMDB-Box-Office-Predictions

2) Creating Virtual Environment

Run the following command to create the virtual environment named 'tmdb_box_office_pred_ml':

conda env create -f tmdb_box_office_pred_venv.yml -n tmdb_box_office_pred_ml

Activate this virtual environment with the following command:

conda activate tmdb_box_office_pred_ml

3) Start IPython

Start the IPython notebook server from the root directory, with the jupyter notebook command.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
README.md		README.md
TMDB_box_office_revenue_prediction.ipynb		TMDB_box_office_revenue_prediction.ipynb
tmdb_box_office_pred_venv.yml		tmdb_box_office_pred_venv.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TMDB Box Office Revenue Predictions

Description

Goal

Process

Summary

Takeaways:

1) Setup

2) Creating Virtual Environment

3) Start IPython

About

Releases

Packages

Contributors 3

Languages

ShreejayaB/TMDB-Box-Office-Predictions

Folders and files

Latest commit

History

Repository files navigation

TMDB Box Office Revenue Predictions

Description

Goal

Process

Summary

Takeaways:

1) Setup

2) Creating Virtual Environment

3) Start IPython

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages