Skip to content

Latest commit

 

History

History
107 lines (90 loc) · 7.58 KB

README.md

File metadata and controls

107 lines (90 loc) · 7.58 KB

What is NBA?

  • The National Basketball Association (NBA) is a professional basketball league in North America. It is the premier men's professional basketball league in the world

NBA Data Analysis & NBA Game Winner Team Estimator: Project Overview

  • Created a NBA Game Winner Team Estimator to determine the winner team based on previous data for home team and away team from Season 2003 - 2020
  • Managed to achieve 0.6969 AUC score with the Logistic Regression model
  • Optimized KNN, Logistic Regression, Random Forest Classifiers and XGBoost using GridsearchCV to reach the best model.
  • Built backend API using FastAPI & Built frontend app using Streamlit
  • Live Project Link: https://nba-game-predict-app.herokuapp.com/

Data Source

https://www.kaggle.com/nathanlauga/nba-games

Data Cleaning

  • Every row has 43 columns. Note: Record is calucated by total wins over sum of total wins and total losses
  • Columns: Meaning GAME_ID: ID of match
    G_home: Number of games played on the season of Home Team
    W_PCT_home: Win % on current season of Home Team
    HOME_RECORD_home: Home record on the current season of Home Team
    ROAD_RECORD_home: Road record on the current season of Home Team
    W_PCT_prev_home: Win % on previous season of Home Team
    HOME_RECORD_prev_home: Home record on the previous season of Home Team
    ROAD_RECORD_prev_home: Road record on the previous season of Home Team
    G_away: Number of games played on the current season by Away Team
    W_PCT_away: Win % on current season of Away Team
    HOME_RECORD_away: Home record on the current season of Away Team
    ROAD_RECORD_away: Road record on the current season of Away Team
    W_PCT_prev_away: Win % on previous season of Away Team
    HOME_RECORD_prev_away: Home record on the previous season of Away Team
    ROAD_RECORD_prev_away: Road record on the previous season of Away Team
    WIN_PRCT_home_3g: Mean Win % on previous 3 games of Home Team
    PTS_home_3g: Mean Number of points scored by Home Team on previous 3 games
    FG_PCT_home_3g: Mean Field Goal Percentage by Home Team on previous 3 games
    FT_PCT_home_3g: Mean Free Throw Percentage by Home Team on previous 3 games
    FG3_PCT_home_3g: Mean Three Point Percentage by Home Team on previous 3 games
    AST_home_3g: Mean Assists by Home Team on previous 3 games
    REB_home_3g: Mean Rebounds by Home Team on previous 3 games
    WIN_PRCT_away_3g: Mean Win % by Away Team on previous 3 games
    PTS_away_3g: Mean Number of points scored by Away Team on previous 3 games
    FG_PCT_away_3g: Mean Field Goal Percentage by Away Team on previous 3 games
    FT_PCT_away_3g: Mean Free Throw Percentage by Away Team on previous 3 games
    FG3_PCT_away_3g: Mean Three Point Percentage by Away Team on previous 3 games
    AST_away_3g: Mean Assists by Away Team on previous 3 games
    REB_away_3g: Mean Rebounds by Away Team on previous 3 games
    WIN_PRCT_home_10g: Mean Win % on previous 10 games of Home Team
    PTS_home_10g: Mean Number of points scored by Home Team on previous 10 games
    FG_PCT_home_10g: Mean Field Goal Percentage by Home Team on previous 10 games
    FT_PCT_home_10g: Mean Free Throw Percentage by Home Team on previous 10 games
    FG3_PCT_home_10g: Mean Three Point Percentage by Home Team on previous 10 games
    AST_home_10g: Mean Assists by Home Team on previous 10 games
    REB_home_10g: Mean Rebounds by Away Team on previous 10 games
    WIN_PRCT_away_10g: Mean Win % by Away Team on previous 10 game
    PTS_away_10g: Mean Number of points scored by Away Team on previous 10 games
    FG_PCT_away_10g: Mean Field Goal Percentage by Away Team on previous 10 games
    FT_PCT_away_10g: Mean Free Throw Percentage by Away Team on previous 10 game
    FG3_PCT_away_10g: Mean Three Point Percentage by Away Team on previous 10 games
    AST_away_10g: Mean Assists by Away Team on previous 10 games
    REB_away_10g: Mean Rebounds by Away Team on previous 10 game
    GAME_DATE_EST: Game's date
    SEASON: Season when the game occured
    HOME_TEAM_WINS: Have Home Team Win(Target Variable)

EDA

I have done some EDA for final games data. Out of curiousity, I have done EDA regarding LeBron's stats. Below are some highlights

Overall Games From All Seasons

alt text

  • From stacked bar chart from left side, looking from bottom to top, the light blue bar increases when the mean Win % on previous 3 games of Home Team increases
  • From stacked bar chart from right side, looking from bottom to top, the light blue bar increases when the mean Win % on previous 10 games of Home Team increases

alt text

  • For stacked bar chart from left side, looking from bottom to top, the light blue bar decreases when the mean Win % on previous 3 games of Away Team increases

  • For stacked bar chart from right side, looking from bottom to top, the light blue bar decreases when the mean Win % on previous 10 games of Away Team increases

  • We can conclude that

    • Higher the win % of previous games of home team, the higher chances that the home team will win
    • Higher the win % of previous games of away team, the lower chances that the home team will win

LeBron's stats

alt text alt text

Model Building

First, I use season 2004 - 2018 as train set while season 2019 as test set. I ignore data from season 2020 because of covid-19 which is an unexpected variable & causing the games in season 2020 not relatively balanced. After that, I have prepared standard-scaled data for Logistic Regression model and minmax-scaled data for K-Nearest Neighbors model.

I tried four different models and evaluated them using ROC AUC score. I chose ROC AUC score as this is a imbalanced dataset. Also, it is suitable to use ROC AUC score to evaluate the ability of model to classify true-positive & true-negative.

I tried four different models:

  • Logistic Regression
  • K-Nearest Neighbors
  • Random Forest
  • XGboost

Model performance

The Logistic Regression model slightly outperformed the other approaches using cross validation evaluation

  • Logistic Regression : ROC AUC score = 0.6969
  • K-Nearest Neighbors : ROC AUC score = 0.6519
  • Random Forest. : ROC AUC score = 0.6966
  • XGboost : ROC AUC score = 0.6961

Productionization

In this step, I built a FastAPI backend endpoint & frontend app using Streamlit. Both backend and frontend app are deployed using docker. In the end, both are deployed live using Heroku. The API endpoint takes in a request with a list of values from a home team's stats & away team's stats and returns an estimated outcome of the current game.