2015.09.24: random forest

2015-09-24: Random Forest

Steven gave an overview of Julia. Ben played Andrew Ng's introduction to machine learning lecture video from his course on Coursera.

A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

We then discussed different ideas in machine learning.

POMDPs for Dummies - partially observable Markov decision processes
Supervised vs. unsupervised learning
Decision tree learning - Wikipedia, the free encyclopedia and more specifically Random forests
boosting vs bagging
Classification

Ben showed another video on linear regression but should've showed this more specific follow-on video about a very high level overview of random forests.

Then Haroldur and Steven explained how support vector machines are basically linear classifiers that add additional dimensions to the data to create complex classification boundaries in the original, lower dimensional space. They also warned how they are notoriously bad at overfitting.

Next we split into groups and learned about random forests (an advanced, over-fitting resistant form of decision trees).

Great resources for BOOSTING your learning about decision trees

Check out this continued Python Titanic tutorial based on Kaggle's Titanic challenge, including a very excellent discussion on why you want to use decision trees, how they fall short, and what you can do to make them more accurate.
Everything tree from this github repo from Andrew Ng's Practical Machine Learning
Wkipedia Decision Tree Learning
Courses:
- Python data science tracks | Dataquest
- R tutorials on dplyr, data.table, ggvis, R Markdown & more | DataCamp
- DataScienceSpecialization/courses - free github repo to last year's Johns Hopkins Data Science certificate curriculum now charged on Coursera
What is machine learning, and how does it work? - YouTube
Comparing supervised learning algorithms | DataSchool.io
How to get better at data science | DataSchool.io with book recommendations like:

Aside on Natural Language Processing

Links based on some of Morgan's work, and Ben's googling:

Google Prediction API
- Creating a Sentiment Analysis Model - Prediction API — Google Cloud Platform
TextBlob: Simplified Text Processing — TextBlob 0.10.0-dev documentation: TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
- web crawl with Beautiful Soup and analyse sentiment with TextBlob | Stack Overflow
Bag of Words Meets Bags of Popcorn | Kaggle

Ideas for next time

Haroldur suggested Statistics for hackers

https://speakerdeck.com/jakevdp/statistics-for-hackers

Both Steven and Haroldur expressed interest in discussin Bayesian methods. Haruldur mentioned Probalistic Programming and Bayesian Methods for Hackers which is a book made up of ipython notebooks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2015.09.24: random forest

2015-09-24: Random Forest

Great resources for BOOSTING your learning about decision trees

Aside on Natural Language Processing

Ideas for next time

Clone this wiki locally