To get set up for this live coding session, clone this repository. You can do so by executing the following in your terminal:
git clone https://github.com/UW-Side-Project-Club/hands-on-ml.git
Alternatively, you can download the zip file of the repository at the top of the main page of the repository. If you prefer not to use git or don't have experience with it, this a good option.
If you do not already have the Anaconda distribution of Python 3, go get it (n.b., you can also do this w/out Anaconda using pip
to install the required packages, however Anaconda is great for Data Science and I encourage you to use it).
Navigate to the relevant directory hands-on-ml
and install required packages in a new conda environment:
conda env create -f environment.yml
This will create a new environment called hands_on_ml. To activate the environment on OSX/Linux, execute
source activate hands_on_ml
On Windows, execute
activate hands_on_ml
In the terminal, execute jupyter notebook
.
Then open the notebook SupervisedLearning.ipynb
and we're ready to get coding. Enjoy.
- https://www.datacamp.com/community/tutorials/kaggle-machine-learning-eda
- https://www.datacamp.com/community/tutorials/kaggle-tutorial-machine-learning
- http://shop.oreilly.com/product/0636920052289.do
- https://www.datacamp.com/community/tutorials/machine-learning-python
- https://github.com/datacamp/datacamp_facebook_live_titanic
- https://github.com/ageron/handson-ml
-
Joel Grus, Data Science from Scratch (O’Reilly). This book presents the funda‐ mentals of Machine Learning, and implements some of the main algorithms in pure Python (from scratch, as the name suggests).
-
Stephen Marsland, Machine Learning: An Algorithmic Perspective (Chapman and Hall). This book is a great introduction to Machine Learning, covering a wide range of topics in depth, with code examples in Python (also from scratch, but using NumPy).
-
Python for Data Analysis (O’Reilly): Data Wrangling with Pandas, NumPy, and IPython (By William McKinney)
- Note: You will find free version of all these books with a quick google search!
Finally, a great way to learn is to join ML competition websites such as Kaggle this will allow you to practice your skills on real-world problems, with help and insights from some of the best ML professionals out there.
When you are learning about Machine Learning it is best to actually experiment with real-world data, not just artificial datasets. Fortunately, there are thousands of open datasets to choose from, ranging across all sorts of domains. Here are a few places you can look to get data:
- Popular open data repositories:
- Meta portals (they list open data repositories):
- Other pages listing many popular open data repositories:
- TCGA: The Cancer Genomics Data: