All the code for the hands-on exercies can be found in this repository.
Table of Contents
To request an account on Zaratan, please join slack at the link above, and fill this Google form.
We have pre-built the dependencies required for this tutorial on Zaratan. This will be activated automatically when you run the bash scripts.
Model weights and the training dataset have
been downloaded in /scratch/zt1/project/sc24/shared/
.
CONFIG_FILE=configs/single_gpu.json sbatch --ntasks-per-node=1 train.sh
Open configs/single_gpu.json
and change precision
to bf16-mixed
and then run -
CONFIG_FILE=configs/single_gpu.json sbatch --ntasks-per-node=1 train.sh
CONFIG_FILE=configs/ddp.json sbatch --ntasks-per-node=4 train.sh
CONFIG_FILE=configs/fsdp.json sbatch --ntasks-per-node=4 train.sh
CONFIG_FILE=configs/axonn.json sbatch --ntasks-per-node=4 train.sh
Add more prompts to data/inference/prompts.txt
if you want. Then run
CONFIG_FILE=configs/inference_axonn.json sbatch --ntasks-per-node=1 infer.sh
Open configs/axonn_inference.json
and change compile
to true
. Then run
CONFIG_FILE=configs/inference_axonn.json sbatch --ntasks-per-node=1 infer.sh
Open configs/axonn_inference.json
and change tp_dimensions
to [4, 1, 1]
. Then run
CONFIG_FILE=configs/inference_axonn.json sbatch --ntasks-per-node=4 infer.sh