This project predicts the risk of student attrition by analyzing various features, such as academic performance, attendance, and extracurricular involvement. By using machine learning models, it identifies students at potential risk of dropping out and provides insights for timely intervention.
The dataset used is synthetic and randomly generated, aiming to simulate real-world student data with features such as academic scores, attendance percentage, part-time job status, extracurricular activities, and weekly study hours.
This project includes two main files:
- student_dropout_risk.ipynb: This file includes the final workflow for the selected models based on performance, handling the primary predictions and evaluations.
- model_selection.ipynb: This file trains multiple models (Random Forest, Gradient Boosting, Linear Regression, SVC, etc.) on the dataset. The best models for classification and regression are selected from this file and then used in
student_dropout_risk.ipynb
for further tuning and evaluation.
The methodology followed in this project includes:
- Data Preprocessing: Cleaning and preparing synthetic data for model training.
- Feature Engineering: Creating additional features to enhance model effectiveness.
- Model Training: Testing various machine learning models for optimal performance.
- Hyperparameter Tuning: Fine-tuning model parameters to improve accuracy and minimize overfitting.
Several models were tested and evaluated, including:
- Random Forest Regressor: Achieved an R² score of 0.9934 with Mean Squared Error (MSE) of 0.0007, making it the best-performing regression model.
- Gradient Boosting Classifier: Provided high classification accuracy with a cross-validation score near 99%, and selected as the best classifier.
The selected models—Random Forest Regressor for regression tasks and Gradient Boosting Classifier for classification tasks—demonstrated strong predictive performance. Random Forest obtained an R² of 0.9934, and Gradient Boosting achieved high accuracy and stability in classification tasks.
- Clone the repository or download the code and open in Google Colab
- Install required dependencies
- To run, execute the cells step by step
- Input user queries for the subjects and other features. Get the results.
example: on the right the chart is for min conditions a user should have. The pie chart is helpful for GAP anlysis.