Welcome to the Fraud Detection Project! This project focuses on identifying fraudulent transactions using data science techniques. Conducted during my studies at Gomycode in Yaba, Nigeria, this project involved building a robust fraud detection system to accurately identify fraudulent activities from a dataset of financial transactions.
- Python 🐍: Primary programming language used for data manipulation, analysis, and model building.
- Pandas 📚: Data manipulation and analysis library.
- NumPy 🔢: Numerical computing library for handling arrays and mathematical operations.
- Plotly 🌈: Visualization of plots
- Scikit-Learn ⚙️: Machine learning library used for building and evaluating models.
- Matplotlib 📈 & Seaborn 🌈: Visualization libraries for creating charts and plots.
- Jupyter Notebook 📓: Interactive environment for developing and documenting code.
- Colab
- Vscode
The main objective of this project was to detect fraudulent transactions using a machine learning model. Fraudulent transactions can cause significant financial losses, and detecting them accurately is crucial for financial institutions.
📊 Dataset The dataset used for this project included various features of financial transactions. Key attributes included:
Transaction Amount 💵
- Transaction Date 📅
- Transaction Location 📍
- Account Information 🏦
- Transaction Type 🔄
- and more
One of the primary challenges encountered was data imbalance. In fraud detection, fraudulent transactions are much rarer compared to legitimate ones. This imbalance can lead to a model that is biased towards predicting non-fraudulent transactions, reducing its effectiveness in detecting fraud.
To address the data imbalance issue, we employed undersampling techniques. Here's how it was implemented:
Identify Majority and Minority Classes: The dataset was split into fraudulent and non-fraudulent transactions. Undersample Majority Class: The number of non-fraudulent transactions was reduced to match the number of fraudulent transactions, creating a balanced dataset.
- Data Preprocessing:
- Cleaning: Handled missing values and outliers.
- Feature Engineering: Created new features to enhance model performance.
- Normalization: Scaled numerical features for consistent model input.
- Exploratory Data Analysis (EDA):
- Visualization: Created histograms, scatter plots, and correlation matrices to understand data distributions and relationships.
- Model Building:
- Algorithms: Tested various algorithms including Logistic Regression, Decision Trees, and Random Forests.
- Evaluation Metrics: Used metrics such as Precision, Recall, F1-Score, and ROC-AUC to evaluate model performance.
- Cross-Validation: Performed cross-validation to ensure the model generalizes well to unseen data.
- Hyperparameter Tuning: Optimized model parameters to improve accuracy.
Achieved a balanced model performance with significant improvement in detecting fraudulent transactions compared to the baseline model. s, from data preprocessing to model evaluation.
- Model Accuracy: Achieved an accuracy of X% with the undersampled dataset.
- Fraud Detection Rate: Improved detection of fraudulent transactions with a recall rate of Y%.