This project aims to practice the steps of Crisp Data Mining ( CRISP-DM ). The repository includes 3 phases, data understanding, supervised learning, and unsupervised learning.
-
In P1, data understanding, I practice looking at the data and checking data quality by plotting numeric and categorical features. Also, I apply some preprocessing methods like min-max scaling to [0,1], standardizing the features to 0 mean and unit variance, and one-hot encoding.
-
In P2, supervised learning, 3 classification methods are implemented; K nearest neighbor (KNN), ride regression, and KNN regression. For hyperparameter optimization, I used one-leave-out cross-validation.
-
In P3, Unsupervised learning, some preprocessing for data visualization methods are implemented; z-score standardization, principal component analysis (PCA), and dendrograms. Moreover, two clustering methods are applied; Agglomerative hierarchical and K-means clustering.