Credit Scoring Model for Loan Default Prediction

Project Overview

This project develops a credit scoring model for "Prêt à Dépenser," a consumer credit company. The goal is to predict the probability of loan default for clients with limited credit history. The project incorporates data preprocessing, machine learning, and advanced interpretability techniques to ensure transparency and improve decision-making in loan approvals.

Objectives

Data Exploration:
- Analyze a dataset of 307,000 clients with 121 features.
- Handle class imbalance in the target variable (TARGET).
Model Development:
- Train and evaluate classification models (Logistic Regression, Random Forest, XGBoost, LightGBM).
- Optimize hyperparameters using grid search with AUC-ROC as the key metric.
Interpretability:
- Use SHAP (SHapley Additive ExPlanations) for global and local feature importance.
- Provide transparency in credit decision-making.

Tools & Techniques

Data Preprocessing:
- Encoded categorical variables using Label Encoding and One-Hot Encoding.
- Addressed missing values with imputation techniques.
- Balanced target classes using SMOTE and undersampling.
Modeling:
- Evaluated models: Dummy Classifier, Logistic Regression, Random Forest, XGBoost, LightGBM.
- LightGBM selected as the best model based on AUC-ROC and F2 scores.
Monitoring:
- Used Evidently AI to detect data drift in 10 key columns, including TARGET.
- Recommended proactive model updates to adapt to changing data distributions.
Evaluation Metrics:
- AUC (Area Under the Curve)
- F2 Score
- Recall
- Precision

Key Results

Best Model: LightGBM achieved the highest AUC-ROC and balanced performance across metrics.
Feature Importance:
- External sources of credit scores (EXT_SOURCE_1, EXT_SOURCE_2, EXT_SOURCE_3) were most influential.
- Loan amount (AMT_CREDIT) and client age (DAYS_BIRTH) also played significant roles.
Interpretability:
- SHAP values provided actionable insights for client-level predictions.

Deliverables

Jupyter Notebooks:
- Exploratory Data Analysis (EDA) and Feature Engineering.
- Model training, evaluation, and interpretability analysis.
Presentation:
- Summarized methodology, results, and future improvements.
Documentation:
- Detailed methodological note for transparency.

Future Work

Explore additional feature engineering using external data sources.
Automate model retraining to handle data drift.
Integrate client feedback to refine predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
OGUZ_Ismail-Can_1_EDA_et_model_ML_112023.ipynb		OGUZ_Ismail-Can_1_EDA_et_model_ML_112023.ipynb
OGUZ_Ismail-Can_2_data_drift_112023 .ipynb		OGUZ_Ismail-Can_2_data_drift_112023 .ipynb
OGUZ_Ismail-Can_3_note_méthodologique_112023.pdf		OGUZ_Ismail-Can_3_note_méthodologique_112023.pdf
OGUZ_Ismail-Can_4_presentation_112023 .pptx		OGUZ_Ismail-Can_4_presentation_112023 .pptx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Scoring Model for Loan Default Prediction

Project Overview

Objectives

Tools & Techniques

Key Results

Deliverables

Future Work

About

Releases

Packages

Languages

isocan/credit-scoring-model

Folders and files

Latest commit

History

Repository files navigation

Credit Scoring Model for Loan Default Prediction

Project Overview

Objectives

Tools & Techniques

Key Results

Deliverables

Future Work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages