Skip to content

Commit

Permalink
Merge branch 'main' into litgpt-eval
Browse files Browse the repository at this point in the history
  • Loading branch information
rasbt authored Apr 3, 2024
2 parents 052d097 + 70218de commit e6f8dc3
Show file tree
Hide file tree
Showing 67 changed files with 1,939 additions and 440 deletions.
8 changes: 3 additions & 5 deletions .github/workflows/cpu-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

Expand All @@ -46,9 +46,7 @@ jobs:

- name: Install minimal dependencies
run: |
# uv pip install . is not yet supported, only `-e .`
# https://github.com/astral-sh/uv/issues/1896
uv pip install --system -e .
uv pip install --system .
uv pip list
# make sure all modules are still importable with only the minimal dependencies available
modules=$(
Expand All @@ -61,7 +59,7 @@ jobs:
- name: Install all dependencies
run: |
uv pip install --system -e '.[all,test]' 'lm_eval @ git+https://github.com/EleutherAI/lm-evaluation-harness.git@115206dc89dad67b8b'
uv pip install --system '.[all,test]' 'lm_eval @ git+https://github.com/EleutherAI/lm-evaluation-harness.git@115206dc89dad67b8b'
uv pip list
- name: Run tests
Expand Down
32 changes: 21 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,21 +27,20 @@

 Optimized and efficient code: Flash Attention v2, multi-GPU support via fully-sharded data parallelism, [optional CPU offloading](tutorials/oom.md#do-sharding-across-multiple-gpus), and [TPU and XLA support](extensions/xla).

 [Pretraining](tutorials/pretraining.md), [finetuning](tutorials/finetune.md), and [inference](tutorials/inference.md) in various precision settings: FP32, FP16, BF16, and FP16/FP32 mixed.
 [Pretraining](tutorials/pretrain.md), [finetuning](tutorials/finetune.md), and [inference](tutorials/inference.md) in various precision settings: FP32, FP16, BF16, and FP16/FP32 mixed.

 [Configuration files](config_hub) for great out-of-the-box performance.

 Efficient finetuning: [LoRA](tutorials/finetune_lora.md), [QLoRA](tutorials/finetune_lora.md), [Adapter](tutorials/finetune_adapter.md), and [Adapter v2](tutorials/finetune_adapter.md).

 [Quantization](tutorials/quantize.md): 4-bit floats, 8-bit integers, and double quantization.

 [Exporting](https://github.com/Lightning-AI/litgpt/blob/wip/tutorials/convert_lit_models.md) to other popular model weight formats.
 [Exporting](tutorials/convert_lit_models.md) to other popular model weight formats.

 Many popular datasets for [pretraining](tutorials/pretrain_tinyllama.md) and [finetuning](tutorials/prepare_dataset.md), and [support for custom datasets](tutorials/prepare_dataset.md#preparing-custom-datasets-for-instruction-finetuning).
 Many popular datasets for [pretraining](tutorials/pretrain.md) and [finetuning](tutorials/prepare_dataset.md), and [support for custom datasets](tutorials/prepare_dataset.md#preparing-custom-datasets-for-instruction-finetuning).

 Readable and easy-to-modify code to experiment with the latest research ideas.


 
<br>
&nbsp;
Expand All @@ -59,8 +58,6 @@ The following [Lightning Studio](https://lightning.ai/lightning-ai/studios) temp





&nbsp;
<br>
&nbsp;
Expand Down Expand Up @@ -107,9 +104,17 @@ For more information, refer to the [download](tutorials/download_model_weights.m


&nbsp;

> [!NOTE]
> We recommend starting with the **[Zero to LitGPT: Getting Started with Pretraining, Finetuning, and Using LLMs](tutorials/0_to_litgpt.md)** if you are looking to get started with using LitGPT.


&nbsp;

## Finetuning and pretraining

LitGPT supports [pretraining](tutorials/pretrain_tinyllama.md) and [finetuning](tutorials/finetune.md) to optimize models on excisting or custom datasets. Below is an example showing how to finetune a model with LoRA:
LitGPT supports [pretraining](tutorials/pretrain.md) and [finetuning](tutorials/finetune.md) to optimize models on excisting or custom datasets. Below is an example showing how to finetune a model with LoRA:

```bash
# 1) Download a pretrained model
Expand All @@ -134,7 +139,7 @@ LitGPT also allows users to use configuration files in YAML format instead of sp

```bash
litgpt finetune lora \
--config https://github.com/Lightning-AI/litgpt/blob/wip/config_hub/finetune/llama-2-7b/lora.yaml
--config https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/finetune/llama-2-7b/lora.yaml
```

For added convenience, you can also manually override config file setting via the CLI:
Expand All @@ -146,7 +151,7 @@ litgpt finetune lora \
--lora_r 4
```

You can browse the available configuration files [here](https://github.com/Lightning-AI/litgpt/tree/main/config_hub).
You can browse the available configuration files [here](config_hub).

&nbsp;

Expand Down Expand Up @@ -324,8 +329,14 @@ If you have general questions about building with LitGPT, please [join our Disco
## Tutorials, how-to guides, and docs
> [!NOTE]
> We recommend starting with the **[Zero to LitGPT: Getting Started with Pretraining, Finetuning, and Using LLMs](tutorials/0_to_litgpt.md)** if you are looking to get started with using LitGPT.
Tutorials and in-depth feature documentation can be found below:
- Finetuning, incl. LoRA, QLoRA, and Adapters ([tutorials/finetune.md](tutorials/finetune.md))
- Pretraining ([tutorials/pretrain_tinyllama.md](tutorials/pretrain_tinyllama.md))
- Pretraining ([tutorials/pretrain.md](tutorials/pretrain.md))
- Model evaluation ([tutorials/evaluation.md](tutorials/evaluation.md))
- Supported and custom datasets ([tutorials/prepare_dataset.md](tutorials/prepare_dataset.md))
- Quantization ([tutorials/quantize.md](tutorials/quantize.md))
Expand Down Expand Up @@ -401,4 +412,3 @@ If you use LitGPT in your research, please cite the following work:
## License

LitGPT is released under the [Apache 2.0](https://github.com/Lightning-AI/litgpt/blob/main/LICENSE) license.

45 changes: 36 additions & 9 deletions config_hub/finetune/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Config files

The table below lists the performances you can expect from the provided config files. Note that you can achieve lower memory consumption by lowering the micro batch size as needed. In addition, you can lower the rank (`lora_r`) in the LoRA configuration files and disable LoRA for certain layers (for example, setting `lora_projection` and other LoRA layer-specific parameters to `false`).
The table below lists the performances you can expect from the provided config files. Note that you can achieve lower memory consumption by lowering the micro batch size as needed. In addition, you can lower the rank (`lora_r`) in the LoRA configuration files and disable LoRA for certain layers (for example, setting `lora_projection` and other LoRA layer-specific parameters to `false`).
For more information, see the [Dealing with out-of-memory (OOM) errors](../../tutorials/oom.md) on lowering the memory requirements.

&nbsp;
Expand All @@ -11,29 +11,56 @@ For more information, see the [Dealing with out-of-memory (OOM) errors](../../tu
| falcon-7b/lora.yaml | 7B | Alpaca 2k | 4 | 0.945 | 16.69 GB | 512 | 2 | bfloat16 | 24.88 min (1xA10G) |
| falcon-7b/qlora.yaml | 7B | Alpaca 2k | 4 | 0.993 | 9.44 GB | 512 | 2 | bfloat16 | 50.76 min (1xA10G) |
| | | | | | | | | | |
| gemma-2b/lora.yaml | 2B | Alpaca 2k | 3 | 1.476 | 12.62 GB | 512 | 2 | bfloat16 | 18.31 min (1xA10G) |
| gemma-2b/qlora.yaml | 2B | Alpaca 2k | 3 | 1.626 | 11.51 GB | 512 | 2 | bfloat16 | 25.29 min (1xA10G) |
| gemma-2b/full.yaml | 2B | Alpaca 2k | 0.35 | 1.046 | 18.47 GB | 512 | 2 | bfloat16 | 16.79 min (2xA10G) |
| gemma-2b/lora.yaml | 2B | Alpaca 2k | 2 | 1.476 | 12.62 GB | 512 | 2 | bfloat16 | 9.29 min (1xA10G) |
| gemma-2b/qlora.yaml | 2B | Alpaca 2k | 2 | 0.981 | 11.59 GB | 512 | 2 | bfloat16 | 12.90 min (1xA10G) |
| gemma-2b/full.yaml | 2B | Alpaca 2k | 0.35 | 0.990 | 17.43 GB | 512 | 1 | bfloat16 | 13.61 min (4xA10G) |
| | | | | | | | | | |
| gemma-7b/lora.yaml | 7B | Alpaca 2k | 2 | 0.903 | 25.30 GB | 512 | 1 | bfloat16 | 11.47 min (1xA100) |
| gemma-7b/qlora.yaml | 7B | Alpaca 2k | 2 | 0.951 | 17.31 GB | 512 | 1 | bfloat16 | 23.46 min (1xA100) |
| | | | | | | | | | |
| llama-2-7b/lora.yaml | 7B | Alpaca 2k | 4 | 0.802 | 19.77 GB | 512 | 2 | bfloat16 | 32.75 min (A10G) |
| llama-2-7b/qlora.yaml | 7B | Alpaca 2k | 4 | 0.814 | 13.68 GB | 512 | 2 | bfloat16 | 45.68 min (A10G) |
| llama-2-7b/full.yaml | 7B | Alpaca 2k | 1 | 0.941 | 26.81 GB | 512 | 4 | bfloat16 | 1.78 min (4xA100) |
| | | | | | | | | | |
| mistral-7b/lora.yaml | 7B | Alpaca 2k | 4 | 0.796 | 20.65 GB | 512 | 2 | bfloat16 | 31.04 min (1xA10G) |
| mistral-7b/qlora.yaml | 7B | Alpaca 2k | 4 | 0.803 | 14.29 GB | 512 | 2 | bfloat16 | 44.69 min (1xA10G) |
| mistral-7b/lora.yaml (v0.1) | 7B | Alpaca 2k | 4 | 0.796 | 20.65 GB | 512 | 2 | bfloat16 | 31.04 min (1xA10G) |
| mistral-7b/qlora.yaml (v0.1) | 7B | Alpaca 2k | 4 | 0.803 | 14.29 GB | 512 | 2 | bfloat16 | 44.69 min (1xA10G) |
| | | | | | | | | | |
| mistral-7b-v0.2/lora.yaml | 7B | Alpaca 2k | 4 | 0.801 | 20.65 GB | 512 | 2 | bfloat16 | 30.96 min (1xA10G) |
| mistral-7b-v0.2/qlora.yaml | 7B | Alpaca 2k | 4 | 0.813 | 14.29 GB | 512 | 2 | bfloat16 | 44.68 min (1xA10G) |
| | | | | | | | | | |
| phi-2/lora.yaml | 2B | Alpaca 2k | 1 | 0.832 | 13.98 GB | 512 | 4 | bfloat16 | 3.82 min (1xA10G) |
| phi-2/qlora.yaml | 2B | Alpaca 2k | 1 | 0.846 | 14.27 GB | 512 | 4 | bfloat16 | 4.55 min (1xA10G) |
| phi-2/full.yaml | 2B | Alpaca 2k | 1 | 0.937 | 14.44 GB | 512 | 4 | bfloat16 | 13.00 min (1xA10G) |
| | | | | | | | | | |
| stablelm-base-alpha-3b/lora.yaml | 7B | Alpaca 2k | 4 | 1.367 | 8.58 GB | 512 | 2 | bfloat16 | 13.02 min (1xA10G) |
| stablelm-base-alpha-3b/qlora.yaml | 7B | Alpaca 2k | 4 | 1.392 | 5.24 GB | 512 | 2 | bfloat16 | 25.71 min (1xA10G) |
| stablelm-base-alpha-3b/full.yaml | 7B | Alpaca 2k | 1 | 1.494 | 21.23 GB | 512 | 1 | bfloat16 | 72.72 min (2xA10G) |
| stablelm-base-alpha-3b/lora.yaml | 3B | Alpaca 2k | 4 | 1.367 | 8.58 GB | 512 | 2 | bfloat16 | 13.02 min (1xA10G) |
| stablelm-base-alpha-3b/qlora.yaml | 3B | Alpaca 2k | 4 | 1.392 | 5.24 GB | 512 | 2 | bfloat16 | 25.71 min (1xA10G) |
| stablelm-base-alpha-3b/full.yaml | 3B | Alpaca 2k | 1 | 1.494 | 21.23 GB | 512 | 1 | bfloat16 | 72.72 min (2xA10G) |
| | | | | | | | | | |
| tiny-llama/lora.yaml | 1.1B | Alpaca 2k | 3 | 1.038 | 13.50 GB | 512 | 8 | bfloat16 | 8.06 min (1xA10G) |
| tiny-llama/qlora.yaml | 1.1B | Alpaca 2k | 3 | 1.056 | 16.24 GB | 512 | 8 | bfloat16 | 8.74 min (1xA10G) |
| tiny-llama/full.yaml | 1.1B | Alpaca 2k | 1 | 1.105 | 14.10 GB | 512 | 4 | bfloat16 | 2.59 min (1xA10G) |

&nbsp;
## Extending the context length

If you require a longer sequence length than the one used in a given config file, you can either edit the `max_seq_length` in the config file or pass an additional argument when running the finetuning command, for example, `--max_seq_length 4096` to override the sequence length provided in the config file.

&nbsp;
## Training on GPUs without bfloat16 support

If you are training on GPUs without bfloat-16 support, you need to change the `precision` option to `16-true` (16-bit floating point precision) or `16-mixed` (16/32-bit mixed precision) training:

```bash
litgpt finetune lora \
--config config_hub/finetune/phi-2/lora.yaml \
--precision 16-true
```
or

```bash
litgpt finetune lora \
--config config_hub/finetune/phi-2/lora.yaml \
--precision 16-mixed
```

Note that `16-true` is more compute and memory-efficient, but it can sometimes lead to training convergence issues. In this case, it's recommended to use `16-mixed`.
8 changes: 4 additions & 4 deletions config_hub/finetune/gemma-2b/full.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ out_dir: out/finetune/full-gemma-2b
precision: bf16-true

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1
devices: 4

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
Expand All @@ -32,7 +32,7 @@ train:
log_interval: 1

# Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
global_batch_size: 6
global_batch_size: 16

# Number of samples per data-parallel rank (type: int, default: 4)
micro_batch_size: 1
Expand All @@ -41,13 +41,13 @@ train:
lr_warmup_steps: 100

# Number of epochs to train on (type: Optional[int], default: 5)
epochs: 3
epochs: 1

# Total number of tokens to train on (type: Optional[int], default: null)
max_tokens:

# Limits the number of optimizer steps to run. (type: Optional[int], default: null)
max_steps:
max_steps: 50

# Limits the length of samples. Off by default (type: Optional[int], default: null)
max_seq_length: 512
Expand Down
4 changes: 2 additions & 2 deletions config_hub/finetune/gemma-2b/lora.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ quantize:
devices: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 16
lora_r: 8

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16
Expand Down Expand Up @@ -71,7 +71,7 @@ train:
lr_warmup_steps: 200

# Number of epochs to train on (type: Optional[int], default: 5)
epochs: 4
epochs: 2

# Total number of tokens to train on (type: Optional[int], default: null)
max_tokens:
Expand Down
2 changes: 1 addition & 1 deletion config_hub/finetune/gemma-2b/qlora.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ train:
lr_warmup_steps: 200

# Number of epochs to train on (type: Optional[int], default: 5)
epochs: 4
epochs: 2

# Total number of tokens to train on (type: Optional[int], default: null)
max_tokens:
Expand Down
122 changes: 122 additions & 0 deletions config_hub/finetune/gemma-7b/lora.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@

# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/google/gemma-7b

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-gemma-7b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize:

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 16

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.1

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: true

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: true

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: true

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: true

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
class_path: litgpt.data.Alpaca2k
init_args:
mask_prompt: false
val_split_fraction: 0.03847
prompt_style: alpaca
ignore_index: -100
seed: 42
num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:

# Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
save_interval: 800

# Number of iterations between logging calls (type: int, default: 1)
log_interval: 1

# Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
global_batch_size: 6

# Number of samples per data-parallel rank (type: int, default: 4)
micro_batch_size: 1

# Number of iterations with learning rate warmup active (type: int, default: 100)
lr_warmup_steps: 200

# Number of epochs to train on (type: Optional[int], default: 5)
epochs: 2

# Total number of tokens to train on (type: Optional[int], default: null)
max_tokens:

# Limits the number of optimizer steps to run. (type: Optional[int], default: null)
max_steps:

# Limits the length of samples. Off by default (type: Optional[int], default: null)
max_seq_length: 512

# Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
tie_embeddings:

# (type: float, default: 0.0003)
learning_rate: 0.0002

# (type: float, default: 0.02)
weight_decay: 0.0

# (type: float, default: 0.9)
beta1: 0.9

# (type: float, default: 0.95)
beta2: 0.95

# (type: Optional[float], default: null)
max_norm:

# (type: float, default: 6e-05)
min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:

# Number of optimizer steps between evaluation calls (type: int, default: 100)
interval: 25

# Number of tokens to generate (type: Optional[int], default: 100)
max_new_tokens: 100

# Number of iterations (type: int, default: 100)
max_iters: 100

# The name of the logger to send metrics to. (type: Literal['wandb', 'tensorboard', 'csv'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337
Loading

0 comments on commit e6f8dc3

Please sign in to comment.