diff --git a/episodes/fertility-prediction.Rmd b/episodes/fertility-prediction.Rmd index 6c2e382..dd41bb2 100644 --- a/episodes/fertility-prediction.Rmd +++ b/episodes/fertility-prediction.Rmd @@ -514,23 +514,27 @@ When proceeding it would be better to use evaluation metrics for this. ## Challenge: Evaluation metrics Evaluate the model using the appropriate evaluation metrics. Hint: the dataset is unbalanced. - :::: solution ## Solution -Good evaluation metrics would be macro precision, recall, and F1-score, -because we want to get a feeling for how the model performs in both classes of the target variable. -In other words, we value a model that can predict both true positives as well as true negatives. +Good evaluation metrics would be precision, recall, and F1-score for the positive class (getting a child in the next 3 years) +This of course also makes sense, sense these are the metrics that are used in the benchmark. + +Precision gives us a measure for how many of the households labeled as 'fertile' was that a correct prediction. +Recall gives us a measure for how many of the households that are actually 'fertile' how many we correctly 'detect' as being fertile. + +F1-score is the harmonic mean of the two. ```python y_pred = model.predict(X_test) -p, r, f, _ = precision_recall_fscore_support(y_test, y_pred, average='macro') +p, r, f, _ = precision_recall_fscore_support(y_test, y_pred, average='binary') print(f'Precision: {p}, recall: {r}, F1-score: {f}') ``` ```outcome -Precision: 0.6297419895408973, recall: 0.7251215721662405, F1-score: 0.6295138888888889 +Precision: 0.23387096774193547, recall: 0.6590909090909091, F1-score: 0.3452380952380952 ``` +Challenge: Test your understanding of precision and recall by computing the scores by hand! You can use the numbers shown in the confusion matrix for this. :::: ::: @@ -538,7 +542,7 @@ Precision: 0.6297419895408973, recall: 0.7251215721662405, F1-score: 0.629513888 ## 10. Adapt, train, evaluate. Adapt, train, evaluate. Good job! You have now set up a simple, yet effective machine learning pipeline on a real-world problem. Notice that you already went through the machine learning cycle twice. -From this point onwards it is a matter of adapting your approach, train the model, evaluate the results. Again, and again, and again. +From this point onward it is a matter of adapting your approach, train the model, evaluate the results. Again, and again, and again. Of course there is still a lot of room for improvement. Every time you evaluate the results, try to come up with a shortlist of things that seem most promising to try out in the next cycle.