Skip to content

This project aims to predict the risk of student attrition by analyzing various features, such as academic performance, attendance, and involvement in extracurricular activities. By utilizing machine learning models, this project provides insights into potential risk factors for student dropout and suggests proactive measures for student retention.

Notifications You must be signed in to change notification settings

Manraj29/Student-Dropout-Attrition-Risk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Student Attrition Risk Prediction

Overview

This project predicts the risk of student attrition by analyzing various features, such as academic performance, attendance, and extracurricular involvement. By using machine learning models, it identifies students at potential risk of dropping out and provides insights for timely intervention.

Dataset

The dataset used is synthetic and randomly generated, aiming to simulate real-world student data with features such as academic scores, attendance percentage, part-time job status, extracurricular activities, and weekly study hours.

Project Structure

This project includes two main files:

  • student_dropout_risk.ipynb: This file includes the final workflow for the selected models based on performance, handling the primary predictions and evaluations.
  • model_selection.ipynb: This file trains multiple models (Random Forest, Gradient Boosting, Linear Regression, SVC, etc.) on the dataset. The best models for classification and regression are selected from this file and then used in student_dropout_risk.ipynb for further tuning and evaluation.

Methodology

The methodology followed in this project includes:

  • Data Preprocessing: Cleaning and preparing synthetic data for model training.
  • Feature Engineering: Creating additional features to enhance model effectiveness.
  • Model Training: Testing various machine learning models for optimal performance.
  • Hyperparameter Tuning: Fine-tuning model parameters to improve accuracy and minimize overfitting.

Models Used and Performance

Several models were tested and evaluated, including:

  • Random Forest Regressor: Achieved an R² score of 0.9934 with Mean Squared Error (MSE) of 0.0007, making it the best-performing regression model.
  • Gradient Boosting Classifier: Provided high classification accuracy with a cross-validation score near 99%, and selected as the best classifier.

Model Performance Summary

The selected models—Random Forest Regressor for regression tasks and Gradient Boosting Classifier for classification tasks—demonstrated strong predictive performance. Random Forest obtained an R² of 0.9934, and Gradient Boosting achieved high accuracy and stability in classification tasks.

Execution Steps

  1. Clone the repository or download the code and open in Google Colab
  2. Install required dependencies
  3. To run, execute the cells step by step
  4. Input user queries for the subjects and other features. Get the results.

example: image image on the right the chart is for min conditions a user should have. The pie chart is helpful for GAP anlysis.

About

This project aims to predict the risk of student attrition by analyzing various features, such as academic performance, attendance, and involvement in extracurricular activities. By utilizing machine learning models, this project provides insights into potential risk factors for student dropout and suggests proactive measures for student retention.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published