Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module 2 course notes #6

Merged
merged 22 commits into from
Oct 12, 2023
Merged
Changes from 1 commit
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
6c1904d
Update ml-monitoring-metrics.md
dmaliugina Oct 10, 2023
1a1cc3f
Update ml-monitoring-setup.md
dmaliugina Oct 10, 2023
80091d7
Update ml-monitoring-architectures.md
dmaliugina Oct 10, 2023
b8369fc
Create readme.md
dmaliugina Oct 12, 2023
75068b4
Added images for Module 2
dmaliugina Oct 12, 2023
4339cae
Create evaluate-ml-model-quality.md
dmaliugina Oct 12, 2023
b59d99a
Create ml-quality-metrics-classification-regression-ranking.md
dmaliugina Oct 12, 2023
b67550f
Update ml-quality-metrics-classification-regression-ranking.md
dmaliugina Oct 12, 2023
f7e182f
Create ml-model-quality-code-practice.md
dmaliugina Oct 12, 2023
e3355a3
Update ml-model-quality-code-practice.md
dmaliugina Oct 12, 2023
6df945a
Create data-quality-in-ml.md
dmaliugina Oct 12, 2023
13ad922
Create data-quality-code-practice.md
dmaliugina Oct 12, 2023
d00d6ca
Create data-prediction-drift-in-ml.md
dmaliugina Oct 12, 2023
151e6f6
Update data-prediction-drift-in-ml.md
dmaliugina Oct 12, 2023
c439695
Create data-prediction-drift-code-practice.md
dmaliugina Oct 12, 2023
e9bb925
Update ml-model-quality-code-practice.md
dmaliugina Oct 12, 2023
3a3370b
Update data-quality-code-practice.md
dmaliugina Oct 12, 2023
3046ba1
Update ml-monitoring-metrics.md
dmaliugina Oct 12, 2023
38b651b
Delete docs/book/ml-observability-course/module-2-ml-monitoring-metri…
dmaliugina Oct 12, 2023
b8f66c0
Update ml-monitoring-architectures.md
dmaliugina Oct 12, 2023
849029c
Update README.md
dmaliugina Oct 12, 2023
0d4b896
Update SUMMARY.md
dmaliugina Oct 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update ml-quality-metrics-classification-regression-ranking.md
dmaliugina authored Oct 12, 2023
commit b67550f048adf58b600be8fbc4c33a646ebf10a9
Original file line number Diff line number Diff line change
@@ -32,7 +32,7 @@ Methods to help visualize and understand classification quality metrics include:
* **Class separation quality** helps visualize correct and incorrect predictions for each class.
* **Error analysis**. You can also map predicted probabilities or model errors alongside feature values and explore if a specific type of misclassification is connected to the particular feature values.

[](<../../../images/2023109\_course\_module2.016.png>)
![](<../../../images/2023109\_course\_module2.016.png>)

{% hint style="info" %}
**Further reading:** [What is your model hiding? A tutorial on evaluating ML models](https://www.evidentlyai.com/blog/tutorial-2-model-evaluation-hr-attrition).
@@ -47,15 +47,15 @@ Regression models provide numerical output which is compared against actual valu
* **Mean Absolute Percentage Error (MAPE)** averages all absolute errors in %. Works well for datasets with objects of different scale (i.e., tens, thousands, or millions).
* **Symmetric MAPE** provides different penalty for over- or underestimation.

[](<../../../images/2023109\_course\_module2.020.png>)
![](<../../../images/2023109\_course\_module2.020.png>)

Some of the methods to analyze and visualize regression model quality are:
* **Predicted vs. Actual** value plots and Error over time plots help derive patterns in model predictions and behavior (e.g., Does the model tend to have bigger errors during weekends or hours of peak demand?).
* **Error analysis**. It is often important to distinguish between **underestimation** and **overestimation** during error analysis. Since errors might have different business costs, this can help optimize model performance for business metrics based on the use case.

You can also map extreme errors alongside feature values and explore if a specific type of error is connected to the particular feature values.

[](<../../../images/2023109\_course\_module2.025.png>)
![](<../../../images/2023109\_course\_module2.025.png>)

## Ranking quality metrics

@@ -69,7 +69,7 @@ We need to estimate the order of objects to measure quality in ranking tasks. So
* **Recall @k** is a coverage of all relevant objects in top-K results.
* **Lift @k** reflects an improvement over random ranking.

[](<../../../images/2023109\_course\_module2.028.png>)
![](<../../../images/2023109\_course\_module2.028.png>)

If you work on a recommender system, you might want to consider additional – “beyond accuracy” – metrics that reflect RecSys behavior. Some examples are:
* Serendipity