- Project for the course Exploratory Data Analysis and Visualization at UCLA Extension
- Completed in December 2020
Background: Competitive balance, which refers to the degree of uncertainty regarding the outcome of a competition, is frequently debated among soccer fans and has received considerable attention both in and outside academia.
Data and research question: In this project I analyzed team performances (points per game, win proportions, etc.) in the four soccer leagues from the 2015/2016 season to the 2019/2020 season. The main research question was: Which soccer league is the most competitive?
Method and findings: The project was completed in R (except for part of the data cleaning process that was done in Excel). Through exploratory data analysis and k-means clustering, I found that, in general, the Major League Soccer was the more competitive than the Bundesliga, the La Liga, and the Premier League.
For a complete report, see the wiki page
- directory for plots created during data visualization
- this document you are currently reading
- directory for academic articles on competitive balance
- final data set
- season-end league tables in all 5 seasons of all 4 leagues
, form-epl.csv
, form-laliga.csv
, form-mls.csv
- form tables in all 5 seasons of the Bundesliga, the EPL, the La Liga, and the MLS, respectively
- main R script for data wrangling, visualization, and statistical analyses
- R script for creating the final data set
The dataset and R scripts are free for download and use, provided that proper credit is given.
If you mention or use any part of my research report, please provide a link to this repo.