INF554 course @ École polytechnique about Machine Learning
Remark: all the make command should be run in the folder of the project (see 'List of the projects' below)
Go in the TD project and run the make
command you want (see 'List of the projects' below)
- Run
python main.py
Remark: if some libraries are missing, you might need to runmake dep
to install them - To know more about the available flags use 'python main.py --help`
python main.py --PCA=True
to PCA (Principal Component Analysis)python main.py --degreeMax=7
- any combination of the two flags
- Fork the repo by clicking on the fork button and then clone your git repo on your computer
- Modify the code as desired
- Follow one of the two 'Ready to use' options above to execute your code
- Introduction to the Machine Learning Pipeline
- Pipeline explanation:
- Loading and inspecting the provided data (temperature, soil moisture and number of new cells).
- Preprocessing to remove an outlier, normalize all the inputs and expand the feature space with polynomial basis functions to be able to fit with a linear model.
- Use of a simple Least Square regression to predict the number of new cells based on the temperature and soil moisture using linear combinations of polynomial functions.
- Illustration of overfitting for high degree polynomial functions.
- Pipeline in action:
make all
to run- possibilities to add PCA ([Principal Component Analysis] or degreeMax flags
- Use of 2 supervised learning methods
make kNN
to use the kNN method to recognized handwritten digits based on the k-Nearest Neighbors algorithmmake LR
to use Logistic Regression method to find the decision boundary in a binary classification problem. Our use case is to determine if a student will be admitted to university based on his grades for 2 exams. To do so, we minimize a cost function (the errors on the prediction of admission) using a batch gradient descent method (from scipy or a mini-batch gradient descent I wrote but uncommenting it at the end of TD2/LR/main.py)
- Various methods
- Feature Selection:
make FS
- SVD:
make SVD
- PCA:
make PCA
- NMF:
make NMF
- Introduction to Tensorflow
- Neural Network to learn on the MNIST dataset
make NN
to launch the model
- Neural Network to find the type of landscape based on characteristics of the picture
make HW
to launch the model (and a 2-minute training)- The report is available here
-
Introduction to Keras see the Jupyter notebook
-
SVM and boosting on Decision Trees
- SVM:
- First example: determining the decision boundary with a linear kernel
- Second example: Gaussian kernel to find a non-linear decision boundary
- Third example: Gaussian kernel in case of blur boundary
make SVM1
ormake SVM2
ormake SVM3
- Adaboost:
- Implementation of the adaboost method on Decision Trees to reduce the variance
- Evaluation of the optimal depth
make AB
- Regularized Logistic Regression
- There is no linear decision boundary between the positives and the negatives
- We work in a more complex feature space to find one with Logistic Regression
- We change our 2D feature to the array of all the polynomial combinaisons of the 2 features (below a certain degree)
- We need some regularization to avoid overfitting
- We add a L2 penalty term on the coefficients to regularize
- Comment/uncomment lines 58-60 of main.py to use a home-made stochastic gradient descent
make RLR
- Reinforcement Learning
- Based on the OpenAI Gym suite (the FrozenLake-v0 game)
- We try to learn a policy to move on a frozen lake with holes
- The results are non deterministic (we can slip on the ice)
- We implement some Reinforcement Learning methods:
- The SARSA algorithm based on the online update of the Q-function, with epsilon-greedy exploration strategy.
- The Q-Learning algorithm which is an off-policy algorithm estimating not directly the Q-function of its current policy but it estimates the value of another policy (which is the optimal one). Set
qlearning = True
in main.py
- We also implemented 2 different ways of exploring:
- The epsilon-greedy exploration strategy
- The softmax exploration strategy. Set
softmax = True
in main.py
- To run the Reinforcement Learning algorithm, run
make
- To test the environment is working fine, you can run
make test
. You should see a dummy inverted pendulum.
- Unsupervised learning
- Based on the k-means algorithm
make KM
- Based on Spectral Clustering to also detect non-convex clusters.
make SC