Backorders are unavoidable, but by anticipating which things will be backordered, planning can be streamlined at several levels, preventing unexpected strain on production, logistics, and transportation. ERP systems generate a lot of data (mainly structured) and also contain a lot of historical data; if this data can be properly utilized, a predictive model to forecast backorders and plan accordingly can be constructed. Based on past data from inventories, supply chain, and sales, classify the products as going into backorder (Yes or No).
The objective of this project is to create a solution for above problem which can predict whether product will go on to be Backorder.
app link: https://backorder-prediction-ml.herokuapp.com/
link: https://youtu.be/s6lLTewy2g8
We have used layered architecture for carrying out below flow actions:
- Jupyter Notebook
- VS Code
- Flask
- Machine Learning Algorithms: Balanced Random Forest Classifeir and Easy Ensemble Classifier
- MLOps
- HTML
We have taken data from Kaggle. It was a historical data with around 1.6 million datapoints in training dataset and 30,000 datapoints in Test dataset.
data link: https://github.com/rodrigosantis1/backorder_prediction/blob/master/dataset.rar
There are six packages in the pipeline: Config, Entity, Constant, Exception, Logger, Components and Pipeline
This package will create all folder structures and provide inputs to the each of the components.
This package will defines named tuple for each of the components config and artifacts it generates.
This package will contain all predefined constants which can be used accessed from anywhere
This package contains the custom exception class for the Prediction Appliaction
This package helps in logging all the activity
This package contains five modules:
- Data Ingestion: This module downloads the data from the link, unzip it, then stores entire data into Db. From DB it extracts all data into single csv file and split it into training and testing datasets.
- Data Validation: This module validates whether data files passed are as per defined schema which was agreed upon by client.
- Data Transformation: This module applies all the Feature Engineering and preprocessing to the data we need to train our model and save the pickle object for same.
- Model Trainer: This module trains the model on transformed data, evalutes it based on R2 accuracy score and saves the best performing model object for prediction
This package contains two modules:
- Training Pipeline: This module will initiate the training pipeline where each of the above mentioned components
will be called sequentially untill model is saved. - Prediction Pipeline: This module will help getting prediction from saved trained model.
Shivansh Kaushal