From 92cda7504c25686a9743bc4e8de55720977d53a8 Mon Sep 17 00:00:00 2001 From: rasbt Date: Mon, 1 Apr 2024 23:20:51 +0000 Subject: [PATCH] pretrain starter docs --- README.md | 8 ++++---- tutorials/0_to_litgpt.md | 1 + tutorials/pretrain.md | 42 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 47 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 162512b914..c06e792578 100644 --- a/README.md +++ b/README.md @@ -27,7 +27,7 @@ ✅  Optimized and efficient code: Flash Attention v2, multi-GPU support via fully-sharded data parallelism, [optional CPU offloading](tutorials/oom.md#do-sharding-across-multiple-gpus), and [TPU and XLA support](extensions/xla). -✅  [Pretraining](tutorials/pretrain_tinyllama.md), [finetuning](tutorials/finetune.md), and [inference](tutorials/inference.md) in various precision settings: FP32, FP16, BF16, and FP16/FP32 mixed. +✅  [Pretraining](tutorials/pretrain.md), [finetuning](tutorials/finetune.md), and [inference](tutorials/inference.md) in various precision settings: FP32, FP16, BF16, and FP16/FP32 mixed. ✅  [Configuration files](config_hub) for great out-of-the-box performance. @@ -37,7 +37,7 @@ ✅  [Exporting](tutorials/convert_lit_models.md) to other popular model weight formats. -✅  Many popular datasets for [pretraining](tutorials/pretrain_tinyllama.md) and [finetuning](tutorials/prepare_dataset.md), and [support for custom datasets](tutorials/prepare_dataset.md#preparing-custom-datasets-for-instruction-finetuning). +✅  Many popular datasets for [pretraining](tutorials/pretrain.md) and [finetuning](tutorials/prepare_dataset.md), and [support for custom datasets](tutorials/prepare_dataset.md#preparing-custom-datasets-for-instruction-finetuning). ✅  Readable and easy-to-modify code to experiment with the latest research ideas. @@ -114,7 +114,7 @@ For more information, refer to the [download](tutorials/download_model_weights.m ## Finetuning and pretraining -LitGPT supports [pretraining](tutorials/pretrain_tinyllama.md) and [finetuning](tutorials/finetune.md) to optimize models on excisting or custom datasets. Below is an example showing how to finetune a model with LoRA: +LitGPT supports [pretraining](tutorials/pretrain.md) and [finetuning](tutorials/finetune.md) to optimize models on excisting or custom datasets. Below is an example showing how to finetune a model with LoRA: ```bash # 1) Download a pretrained model @@ -336,7 +336,7 @@ If you have general questions about building with LitGPT, please [join our Disco Tutorials and in-depth feature documentation can be found below: - Finetuning, incl. LoRA, QLoRA, and Adapters ([tutorials/finetune.md](tutorials/finetune.md)) -- Pretraining ([tutorials/pretrain_tinyllama.md](tutorials/pretrain_tinyllama.md)) +- Pretraining ([tutorials/pretrain.md](tutorials/pretrain.md)) - Model evaluation ([tutorials/evaluation.md](tutorials/evaluation.md)) - Supported and custom datasets ([tutorials/prepare_dataset.md](tutorials/prepare_dataset.md)) - Quantization ([tutorials/quantize.md](tutorials/quantize.md)) diff --git a/tutorials/0_to_litgpt.md b/tutorials/0_to_litgpt.md index 415190fec6..91ec3a7107 100644 --- a/tutorials/0_to_litgpt.md +++ b/tutorials/0_to_litgpt.md @@ -125,6 +125,7 @@ litgpt pretrain --help **More information and additional resources** +- [tutorials/pretraimd](./pretrain.md): General information about pretraining in LitGPT - [tutorials/pretrain_tinyllama](./pretrain_tinyllama.md): A tutorial for finetuning a 1.1B TinyLlama model on 3 trillion tokens - [config_hub/pretrain](../config_hub/pretrain): Pre-made config files for pretraining that work well out of the box - Project templates in reproducible environments with multi-GPU and multi-node support: diff --git a/tutorials/pretrain.md b/tutorials/pretrain.md index 854320b9c6..fc1d9cbb29 100644 --- a/tutorials/pretrain.md +++ b/tutorials/pretrain.md @@ -4,12 +4,54 @@ The simplest way to get started with pretraining LLMs in LitGPT ... +  +## The Pretraining API + +You can pretrain models in LitGPT using the `litgpt pretrain` API starting with any of the available architectures listed by calling `litgpt pretrain` without any additional arguments: + +```bash +litgpt pretrain +``` + +Shown below is an abbreviated list + +``` +ValueError: Please specify --model_name . Available values: +Camel-Platypus2-13B +... +Gemma-2b +... +Llama-2-7b-hf +... +Mixtral-8x7B-v0.1 +... +pythia-14m +``` + +For demonstration purposes, we can pretrain a small 14 million-parameter Pythia model on the small TinyStories dataset using the [debug.yaml config file](https://github.com/Lightning-AI/litgpt/blob/main/config_hub/pretrain/debug.yaml) as follows: + +```bash +litgpt pretrain \ + --model_name pythia-14m \ + --config https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/pretrain/debug.yaml +``` + + + +   ## Pretrain a 1.1B TinyLlama model You can find an end-to-end LitGPT tutorial for pretraining a TinyLlama model using LitGPT [here](pretrain_tinyllama.md). +  +## Optimize LitGPT pretraining with Lightning Thunder + +[Lightning Thunder](https://github.com/Lightning-AI/lightning-thunder) is a source-to-source compiler for PyTorch, which is fully compatible with LitGPT. In experiments, Thunder resulted in a 40% speed-up compared to using regular PyTorch when finetuning a 7B Llama 2 model. + +For more information, see the [Lightning Thunder extension README](https://github.com/Lightning-AI/lightning-thunder). +   ## Project templates