Skip to content

Classification problems with imbalance datasets, a SMOTE approach

Notifications You must be signed in to change notification settings

cheeann13/Custormer-Segmentation

Repository files navigation

Custormer Segmentation - Accuracy, F1, Recall, Which Metric to Optimize?

The classical trade-off of an imbalanced dataset, and why the model with the highest accuracy doesn't always make the best model.

Problem Statement 🏦

Our bank has been facing declining revenues lately. To combat this, they've initiated marketing campaigns to encourage more deposits.

They opted to develop a deep-learning model to forecast the campaign's results. This enables the marketing team to pinpoint a customer segment with high potential, allowing for targeted marketing efforts. Simultaneously, it minimizes ad spending on customers who are less likely to subscribe.

Dataset Given

image

Initial Model (No Oversampling)

Initial Model

Accuracy at 0.91, sounds like a fantastic model, right?

Well, not so fast! The marketing manager has some concerns.

Model Evaluation

When customers see our campaign, they either subscribe to a term deposit or don't. This gives us four possible outcomes, with different weightage to the marketing team:

Class Description
True Positive (TP) Customers subscribed, just as we predicted! 😍
True Negative (TN) Customers didn't subscribe, and we accurately anticipated it. This foresight optimizes our marketing budget 😎
False Positive (FP) Oops! We expected these customers to subscribe, but they didn't. This misjudgment costs the bank 😬
False Negative (FN) Oh no! These customers subscribed, but we missed it. BAD, Bank losing revenue and we're losing our job here 😱

Recall - When the TP Class is Super Important

Looking at the recall of the TP, it's only 0.37. This means that out of 1000 customers who sign up, our model mistakenly labels 630 of them as not interested! 😱

Suddenly, this model doesn't seem so great for our campaign.

SMOTE - Oversampling Imbalanced Datasets

image image

After using SMOTE to resample our data, our accuracy took a hit, dropping to 0.84. But here's the silver lining: the recall for TP shot up to 0.91! 🥳

This means that out of 1000 customers who would subscribe, our model now catches 910 of them. The marketing team is ecstatic! They can now confidently use our model to segment customers and supercharge their sales funnel.

Credits:

Datasource: Kaggle

About

Classification problems with imbalance datasets, a SMOTE approach

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published