Student Attrition Risk Prediction

Overview

This project predicts the risk of student attrition by analyzing various features, such as academic performance, attendance, and extracurricular involvement. By using machine learning models, it identifies students at potential risk of dropping out and provides insights for timely intervention.

Dataset

The dataset used is synthetic and randomly generated, aiming to simulate real-world student data with features such as academic scores, attendance percentage, part-time job status, extracurricular activities, and weekly study hours.

Project Structure

This project includes two main files:

student_dropout_risk.ipynb: This file includes the final workflow for the selected models based on performance, handling the primary predictions and evaluations.
model_selection.ipynb: This file trains multiple models (Random Forest, Gradient Boosting, Linear Regression, SVC, etc.) on the dataset. The best models for classification and regression are selected from this file and then used in student_dropout_risk.ipynb for further tuning and evaluation.

Methodology

The methodology followed in this project includes:

Data Preprocessing: Cleaning and preparing synthetic data for model training.
Feature Engineering: Creating additional features to enhance model effectiveness.
Model Training: Testing various machine learning models for optimal performance.
Hyperparameter Tuning: Fine-tuning model parameters to improve accuracy and minimize overfitting.

Models Used and Performance

Several models were tested and evaluated, including:

Random Forest Regressor: Achieved an R² score of 0.9934 with Mean Squared Error (MSE) of 0.0007, making it the best-performing regression model.
Gradient Boosting Classifier: Provided high classification accuracy with a cross-validation score near 99%, and selected as the best classifier.

Model Performance Summary

The selected models—Random Forest Regressor for regression tasks and Gradient Boosting Classifier for classification tasks—demonstrated strong predictive performance. Random Forest obtained an R² of 0.9934, and Gradient Boosting achieved high accuracy and stability in classification tasks.

Execution Steps

Clone the repository or download the code and open in Google Colab
Install required dependencies
To run, execute the cells step by step
Input user queries for the subjects and other features. Get the results.

example: on the right the chart is for min conditions a user should have. The pie chart is helpful for GAP anlysis.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
model_selection.ipynb		model_selection.ipynb
student_dropout_risk.ipynb		student_dropout_risk.ipynb
trial_data.csv		trial_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Student Attrition Risk Prediction

Overview

Dataset

Project Structure

Methodology

Models Used and Performance

Model Performance Summary

Execution Steps

About

Releases

Packages

Languages

Manraj29/Student-Dropout-Attrition-Risk

Folders and files

Latest commit

History

Repository files navigation

Student Attrition Risk Prediction

Overview

Dataset

Project Structure

Methodology

Models Used and Performance

Model Performance Summary

Execution Steps

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages