Skip to content

iubh/DLMDSML01

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 

Repository files navigation

DLMDSML01 - Machine Learning

Q & A - Sessions

Introduction to Machine-Learning & Optimization

Machine Learning Applications

01b_ml_applications.ipynb (last update: 2021-08-22)

Our First Machine Learning Model

01_intro_to_ml.ipynb (last update: 2021-08-22)

Optimization Algorithms in Machine Learning and Beyond

02_optimization_algorithms.ipynb (last update: 2021-03-23)

Regression & Classification

Regression

03_regression.ipynb (last update: 2021-04-06)

Hands-On Classification

not yet prepared (last update: xx-xx-xx)

Multiclass Classification

multiclass_classification.ipynb (last update: 2021-02-09): We discuss how to generalize a classification problem to a multiclass classification problem. First of all, we show how to transform a logistic regression model into a multinomial logistic regression model. Then we show, with the use of the Iris dataset, how to generalize the sklearn classification algorithms to multiclass problems. After an outlook into multiclass performance metrics, like a multiclass confusion matrix, we discuss so-called meta-estimators available in *sklearn.multiclass* which help to increase accuracy and runtime performance of the classifiers .

Clustering

Hands-On Clustering

02_clustering.ipynb (last update: 2021-04-26): We analyze clustering algorithms both from a practical and a theoretical perspective. We go into detail of different clustering approaches, like k-means clustering, Gaussian mixture models, DBSCAN and hierachical clustering. In order to gain insights into the theoretical aspects of clustering we discuss the concept of similarity measures and define metrics to measure the quality of clustering methods. Finally we evaluate our techniques on a clustering use case.

Hands-On Clustering - Part II

02b_clustering.ipynb (last update: 2021-05-04)

Additional: Maximum Likelihood and Expectation-Maximization Algorithm

02c_MLE_and_EM_algorithm.ipynb (last update: 2021-10-19)

Support Vector Machines

Hands-On Support Vector Machines

04_support_vector_machines.ipynb (last update: 2021-06-08)

Decision Trees and Ensemble Methods

Decision Trees and Random Forests

05_decision_trees_and_random_forests.ipynb (last update: 2021-06-22)

Boosting Methods

09_boosting_methods.ipynb (last update: 2021-07-06): We deepen our understanding of random forest algorithms, namely how boosting trees work. After discussing an analytical example we go over to the scikit learn's implementation of boosted trees. We also discuss most recent algorithms, as XGBoost, LightGBM and CatBoost.

Genetic Algorithms (GAs)

Theory and Concepts

Q_A_genetic_algorithms_theory.ipynb (last update: 2021-07-20): Based on *Haupt & Haupt, Practical Genetic Algorithms (2004)* we discuss how to approach GAs both for binary as well as continuous problems. We try to understand how to encode variables, find the initial population, perform the natural selection process, discuss mating/crossover strategies and mutation strategies until convergence is reached.

Applications

Q_A_genetic_algorithms_applications.ipynb: The knapsack problem and the traveling salesman problem. (last update: 2021-07-20)

Additional Material

Performance Metrics

performance_measures.ipynb (last update: 2020-12-22) We discuss how to evaluate the performance of a machine-learning algorithm, both for supervised and unsupervised tasks. Jupyter notebook exploring the individual performance measures from the *sklearn.metrics* functions.

Recommendation Systems

recommendation_systems.ipynb` (last update: 2021-01-05): We discuss the basic principles of how to implement recommendation systems. For the MovieLens dataset we build up a first, simple user-based collaborative filtering movie recommendation system.

Machine Learning and Parallel Computing

multiclass_classification.ipynb (last update: 2021-02-23): We show on a simple example how easy it is to parallelize a for-loop in python (see main.py and main_multi.py). We then turn to parallelizable tasks in Machine Learning, the difference between data and model parallelization, GPU usage and cloud computing.

Open Questions

open_questions.ipynb (last update: 2021-08-10): Open questions on Machine Learning, where you can test your knowledge and understanding.

Releases

No releases published

Packages

No packages published