Introduction to Statistical Learning with Python :)
Python version of the labs of the classical book An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, published by Springer.
Originally these labs are presented in R code. I think it would be great to rewrite each lab in Python using Jupyter notebooks and share with everybody.
The code will make use of the standard stack for data science in Python: NumPy, SciPy, pandas, matplotlib and of course scikit-learn.
The book is available for free on the book's page.
The labs are the following:
- 2 - Introduction to R
- 3 - Linear Regression
- 4 - Logistic Regression, LDA, QDA and KNN
- 5 - Cross-Validation and the Bootstrap
- 6.1 - Subset Selecion Methods
- 6.2 - Ridge Regression and the Lasso
- 6.3 - PCR and PLS Regression
- 7 - Non-linear Modeling
- 8 - Decision Trees
- 9 - Support Vector Machines
- 10.1 - Principal Component Analysis
- 10.2 - Clustering
- 10.3 NCI60 Data Example
I do not own any of the original content of the ISLR book. These notebooks are just coding exercises that can be useful or at least interesting to some readers.