EMLOV4-Session-04 Assignment - PyTorch Lightning - I
Add Dockerfile for the project
Create a DevContainer for the Project
Docker Image should have your package installed
Use this dataset: https://www.kaggle.com/datasets/khushikhushikhushi/dog-breed-image-datasetLinks to an external site.
Add eval.pyLinks to an external site. to load model from checkpoint and run on validation dataset
Must print the validation metrics
Push the repository to GitHub
Use infer.pyLinks to an external site. to run on 10 images
Add instructions on README.mdLinks to an external site.
How to use docker run to train and eval the model
How to Train, Eval, Infer using Docker
Make sure to use Volume Mounts!
Made required folders for model, data loader and data storage
Used auto fetch and unload module with gdown module
Wrote training script where model has a call back of checkpoint saving module
Wrote evaluation script that outputs validation metrics and wrote infer script that infers on 10 images and stores output in infer_images folder
Used model_storage as volume mount for all docker containers
docker build -t dog_train -f ./Dockerfile .
Docker file usage to train, eval and infer
docker run --rm -v ./model_storage:/workspace/model_storage dog_train python src/train.py --data data --logs logs --ckpt_path model_storage
docker run --rm -v ./model_storage:/workspace/model_storage dog_train python src/eval.py --data data --ckpt_path "model_storage/epoch=0-checkpoint.ckpt"
docker run --rm -v ./model_storage:/workspace/model_storage -v ./infer_images:/workspace/infer_images dog_train python src/infer.py --input_folder data/dataset/val/ --output_folder infer_images --ckpt_path "model_storage/epoch=0-checkpoint.ckpt"
Use root or Use ~/.local/bin/uv if in dev container
Writing custom model saving classes
Ajith Kumar V (myself)
Aakash Vardhan
Anvesh Vankayala
Manjunath Yelipeta
Abhijith Kumar K P
Sabitha Devarajulu