machine_learning

INF554 course @ École polytechnique about Machine Learning

Use the code

Remark: all the make command should be run in the folder of the project (see 'List of the projects' below)

Ready to use

With the Makefile

Go in the TD project and run the make command you want (see 'List of the projects' below)

With the python interpreter

Run python main.py
Remark: if some libraries are missing, you might need to run make dep to install them
To know more about the available flags use 'python main.py --help`
- python main.py --PCA=True to PCA (Principal Component Analysis)
- python main.py --degreeMax=7
- any combination of the two flags

Modify the code

Fork the repo by clicking on the fork button and then clone your git repo on your computer
Modify the code as desired
Follow one of the two 'Ready to use' options above to execute your code

List of the projects

Introduction to the Machine Learning Pipeline

Pipeline explanation:
- Loading and inspecting the provided data (temperature, soil moisture and number of new cells).
- Preprocessing to remove an outlier, normalize all the inputs and expand the feature space with polynomial basis functions to be able to fit with a linear model.
- Use of a simple Least Square regression to predict the number of new cells based on the temperature and soil moisture using linear combinations of polynomial functions.
- Illustration of overfitting for high degree polynomial functions.
Pipeline in action:
- make all to run
- possibilities to add PCA ([Principal Component Analysis] or degreeMax flags

Use of 2 supervised learning methods

make kNN to use the kNN method to recognized handwritten digits based on the k-Nearest Neighbors algorithm
make LR to use Logistic Regression method to find the decision boundary in a binary classification problem. Our use case is to determine if a student will be admitted to university based on his grades for 2 exams. To do so, we minimize a cost function (the errors on the prediction of admission) using a batch gradient descent method (from scipy or a mini-batch gradient descent I wrote but uncommenting it at the end of TD2/LR/main.py)

Various methods

Feature Selection: make FS
SVD: make SVD
PCA: make PCA
NMF: make NMF

Introduction to Tensorflow

Neural Network to learn on the MNIST dataset
- make NN to launch the model
Neural Network to find the type of landscape based on characteristics of the picture
- make HW to launch the model (and a 2-minute training)
- The report is available here

Introduction to Keras see the Jupyter notebook
SVM and boosting on Decision Trees

SVM:
- First example: determining the decision boundary with a linear kernel
- Second example: Gaussian kernel to find a non-linear decision boundary
- Third example: Gaussian kernel in case of blur boundary
- make SVM1 or make SVM2 or make SVM3
Adaboost:
- Implementation of the adaboost method on Decision Trees to reduce the variance
- Evaluation of the optimal depth
- make AB

Regularized Logistic Regression

There is no linear decision boundary between the positives and the negatives
- We work in a more complex feature space to find one with Logistic Regression
- We change our 2D feature to the array of all the polynomial combinaisons of the 2 features (below a certain degree)
We need some regularization to avoid overfitting
- We add a L2 penalty term on the coefficients to regularize
- Comment/uncomment lines 58-60 of main.py to use a home-made stochastic gradient descent
- make RLR

Reinforcement Learning

Based on the OpenAI Gym suite (the FrozenLake-v0 game)
- We try to learn a policy to move on a frozen lake with holes
- The results are non deterministic (we can slip on the ice)
We implement some Reinforcement Learning methods:
- The SARSA algorithm based on the online update of the Q-function, with epsilon-greedy exploration strategy.
- The Q-Learning algorithm which is an off-policy algorithm estimating not directly the Q-function of its current policy but it estimates the value of another policy (which is the optimal one). Set qlearning = True in main.py
We also implemented 2 different ways of exploring:
- The epsilon-greedy exploration strategy
- The softmax exploration strategy. Set softmax = True in main.py
To run the Reinforcement Learning algorithm, run make
To test the environment is working fine, you can run make test. You should see a dummy inverted pendulum.

Unsupervised learning

Based on the k-means algorithm
- make KM
Based on Spectral Clustering to also detect non-convex clusters.
- make SC

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
TD1		TD1
TD2		TD2
TD3		TD3
TD4		TD4
TD5		TD5
TD6		TD6
TD7		TD7
TD8		TD8
TD9		TD9
kaggle		kaggle
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

machine_learning

Use the code

Ready to use

With the Makefile

With the python interpreter

Modify the code

List of the projects

About

Releases

Packages

Languages

romainfd/machine_learning

Folders and files

Latest commit

History

Repository files navigation

machine_learning

Use the code

Ready to use

With the Makefile

With the python interpreter

Modify the code

List of the projects

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages