Custormer Segmentation - Accuracy, F1, Recall, Which Metric to Optimize?

The classical trade-off of an imbalanced dataset, and why the model with the highest accuracy doesn't always make the best model.

Problem Statement 🏦

Our bank has been facing declining revenues lately. To combat this, they've initiated marketing campaigns to encourage more deposits.

They opted to develop a deep-learning model to forecast the campaign's results. This enables the marketing team to pinpoint a customer segment with high potential, allowing for targeted marketing efforts. Simultaneously, it minimizes ad spending on customers who are less likely to subscribe.

Dataset Given

Initial Model (No Oversampling)

Accuracy at 0.91, sounds like a fantastic model, right?

Well, not so fast! The marketing manager has some concerns.

Model Evaluation

When customers see our campaign, they either subscribe to a term deposit or don't. This gives us four possible outcomes, with different weightage to the marketing team:

Class	Description
True Positive (TP)	Customers subscribed, just as we predicted! 😍
True Negative (TN)	Customers didn't subscribe, and we accurately anticipated it. This foresight optimizes our marketing budget 😎
False Positive (FP)	Oops! We expected these customers to subscribe, but they didn't. This misjudgment costs the bank 😬
False Negative (FN)	Oh no! These customers subscribed, but we missed it. BAD, Bank losing revenue and we're losing our job here 😱

Recall - When the TP Class is Super Important

Looking at the recall of the TP, it's only 0.37. This means that out of 1000 customers who sign up, our model mistakenly labels 630 of them as not interested! 😱

Suddenly, this model doesn't seem so great for our campaign.

SMOTE - Oversampling Imbalanced Datasets

After using SMOTE to resample our data, our accuracy took a hit, dropping to 0.84. But here's the silver lining: the recall for TP shot up to 0.91! 🥳

This means that out of 1000 customers who would subscribe, our model now catches 910 of them. The marketing team is ecstatic! They can now confidently use our model to segment customers and supercharge their sales funnel.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
__pycache__		__pycache__
dataset		dataset
logs		logs
model		model
result_with_SMOTE_0.91_sensitivity		result_with_SMOTE_0.91_sensitivity
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md
customer_segmentation_module.py		customer_segmentation_module.py
customer_segmentation_train.ipynb		customer_segmentation_train.ipynb
customer_segmentation_train.py		customer_segmentation_train.py
model.png		model.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Custormer Segmentation - Accuracy, F1, Recall, Which Metric to Optimize?

The classical trade-off of an imbalanced dataset, and why the model with the highest accuracy doesn't always make the best model.

Problem Statement 🏦

Dataset Given

Initial Model (No Oversampling)

Model Evaluation

Recall - When the TP Class is Super Important

SMOTE - Oversampling Imbalanced Datasets

Credits:

Datasource: Kaggle

About

Releases

Packages

Languages

cheeann13/Custormer-Segmentation

Folders and files

Latest commit

History

Repository files navigation

Custormer Segmentation - Accuracy, F1, Recall, Which Metric to Optimize?

The classical trade-off of an imbalanced dataset, and why the model with the highest accuracy doesn't always make the best model.

Problem Statement 🏦

Dataset Given

Initial Model (No Oversampling)

Model Evaluation

Recall - When the TP Class is Super Important

SMOTE - Oversampling Imbalanced Datasets

Credits:

Datasource: Kaggle

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages