SAS-Project-Unbalanced-Classes

SAS Project

Unbalanced Classes

HSE 2021

Prepared by
Zobov Vladimir, Karzanov Daniil, Molostvov Pavel
Supervisor: Maria Vorobyova

main.ipynb notebook contains the python implementation of the following items:

Data analysis
For us, the most important thing is to minimize the number of real defaults that were classified as non-default, that is, after the selection of clients by the model, the bank would receive as few clients who would not be able to repay the loan, as possible. However, it is also important that the number of issued loans does not decrease significantly after the model has been run. That is why we will use two metrics to assess the model's performance: FOR (False Omission Rate) will show how many clients, of which the model offered to issue a loan, will be defaulters and FPR (False Positive Rate) will show how many of all non-defaulters were denied a loan by the model. By minimizing both metrics we will achieve the best model.
Feature engineering
There are a few methods to deal with imbalanced target. We will try some under and over-sampling techniques as well as special models that rebalance the target while fitting. Also, we will try simply to increase the threshold, oftentimes it is sufficient to have really good results.
Undersampling
Undersampling is a group of techniques that consists of reducing the data by eliminating examples belonging to the majority class with the objective of equalizing the number of examples of each class.
- Random undersampling
- Tomek link
- InstanceHardnessThreshold
- NeighbourhoodCleaningRule
Oversampling
Oversampling is the set of techniques that are based on duplicating examples from the minority class and adding them to the training dataset.
- Random oversampling
- Smote
- Adasyn
Models that rebalance target while fitting
Models that rebalance target while fitting is methods generating under-sampled or over-sampled subsets combined inside an ensemble.
- EasyEnsembleClassifier
- RUSBoostClassifier
- BalancedBaggingClassifier
- BalancedRandomForestClassifier
Model evaluation and model selection
We can see that there are 5 models that perform better than others: No resampling, Easy Ensemble, Balanced Random Forest, Random Underdampling, Random Oversampling. Now we will test them in terms of the best profit that they can give us. We will assume that the average debt on the credit card is the same for clients who default and not default. Also, we will assume that in case of default we lose all the money that the client paid from his credit card and we will use an interest rate of 20 because it is the nearest value to the real credit card interests in US dollars now.
Profitability of model evaluation

We can see that with the Stacking of Balanced Random Forest, Catboost and XGBoost we can get the best result by profit. Surprisingly, this model has been trained on resampled data (Stacking with Balanced Random Forest but trained on original data is not shown, but its result was worse), so we could conclude that additional balanced fitting after undersampling also has a positive influence on the model. This model gives us a 15% increase in revenue, which is a significant improvement for the bank. Also, we would like to draw your attention to the CatBoost model. Its financial result is as good as in the best model, but the number of clients in this model greater than in others. Bank may use this model if they are also interested in attracting clients to different products through credit cards.

In addition,
in place of a GUI, I have created the telegram bot in Scoring_bot/Scoring.py file that contains the best model. The bot allows a user to fill the form and learn if he or she may receive a loan.

Here is an example of how it works:

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.deepnote		.deepnote
Scoring_bot_tg		Scoring_bot_tg
catboost_info		catboost_info
.gitattributes		.gitattributes
Procfile		Procfile
README.md		README.md
X_train.csv		X_train.csv
main.ipynb		main.ipynb
y_train.csv		y_train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAS-Project-Unbalanced-Classes

SAS Project

Unbalanced Classes

HSE 2021

About

Releases

Packages

Languages

PashaM999/SAS-Project-Unbalanced-Classes

Folders and files

Latest commit

History

Repository files navigation

SAS-Project-Unbalanced-Classes

SAS Project

Unbalanced Classes

HSE 2021

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages