This project predicts a suitable stream (Science, Commerce, Humanities, or Vocational Courses) for students based on their scores. The model uses logistic regression to classify students based on their total score, and also validates its prediction using a simple rule-based approach. The dataset used includes students' scores in various subjects, and their stream is suggested based on a pre-defined scoring threshold.
The dataset used for this project can be found here. It includes the following:
- Subject scores for each student.
- Note: The dataset excludes the "Email Address" column to maintain privacy and focuses only on subject scores.
-
Score Calculation:
- The total score is computed by summing individual subject scores for each student.
-
Stream Suggestion:
- Based on the total score, a stream is suggested:
- Science: Total score ≥ 80
- Commerce: 60 ≤ Total score < 80
- Humanities: 40 ≤ Total score < 60
- Vocational Courses: Total score < 40
- Based on the total score, a stream is suggested:
-
Logistic Regression:
- The data is split into training and testing sets.
- Logistic regression is used to train a model for predicting the stream based on the total score.
-
Model Evaluation:
- The model is evaluated using the accuracy score and compares the model’s predictions against the rule-based expected streams.
-
New Student Prediction:
- A new random student score is generated, and the model predicts the appropriate stream.
- The prediction is cross-checked with the rule-based stream assignment to ensure consistency.
- The dataset is loaded and unnecessary columns, such as email addresses, are dropped.
- Missing values are handled, and a total score is calculated for each student.
- A function
suggest_stream
is used to assign streams based on the total score.
- The target labels (streams) are encoded using
LabelEncoder
to prepare for training.
- The dataset is split into training and testing sets, and a logistic regression model is trained.
- A random score for a new student is generated, and the stream prediction is validated against the rule-based assignment.
- The model’s performance is evaluated with accuracy and checked for consistency between the predicted and expected streams.
- Install the required libraries:
pip install pandas numpy scikit-learn
- Run the script:
python stream_suggestion.py
- The script will output:
- Stream distribution in the dataset.
- Model accuracy.
- Suggested streams for new students.
- Comparison between model prediction and rule-based suggestions.
This model effectively predicts a student's stream based on their score and validates the predictions with rule-based logic. The logistic regression model helps automate the decision-making process, while cross-checking ensures reliability.