End-to-end Text Classification project, which is fine-tuned with the BERT-based (distilbert/distilbert-base-uncased) model with the dataset of human text and AI-generated text.
Model - distilbert/distilbert-base-uncased Dataset -AI Vs Human Text from Kaggle, https://www.kaggle.com/datasets/shanegerami/ai-vs-human-text Mongo Db Flask Deployed on AWS EC2 server
- Update config.yaml
- Update params.yaml
- Update entity
- Update the configuration manager in src config
- update the conponents
- update the pipeline
- update the main.py
- update the app.py
The stages of the project are Data Ingestion, Data Validation, Data Transformation, Data Training, and Model evaluation.
- Data Ingestion - Data can be downloaded from a server or database like MySQL or MongoDB. Here data is downloaded from Mongo.
- Data Validation - To check data are available for further processing and training.
- Data Transformation - Involves various tools and technology to process row data and make it suitable for training.
- Model training - The tokenizer and model ( ‘distilbert/distilbert-base-uncased’) is used for model training.
- Model Evaluation - After training model is evaluated by accuracy score from sklearn metrics.
After evaluation of model it deployed on AWS EC2 instance ubuntu server, Method of deployment is CI/CD deployment using github action and docker.
STEPS FOR DELPOYMENT -
- Login to AWS console
- Create IAM user for deployment
- Create ECR repo to store/save docker image
- Create EC2 machine (Ubuntu)
- Open EC2 and Install docker in EC2 Machine:
- Configure EC2 as self-hosted runner:
- Setup github secrets: