Skip to content

Commit

Permalink
Merge branch 'main' into carmocca/qlora
Browse files Browse the repository at this point in the history
  • Loading branch information
rasbt authored Mar 28, 2024
2 parents 3091c2b + 660d936 commit 69e3ec5
Show file tree
Hide file tree
Showing 128 changed files with 6,510 additions and 975 deletions.
7 changes: 6 additions & 1 deletion .github/azure-gpu-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,14 @@ jobs:
displayName: "Image info & NVIDIA"
- script: |
pip install '.[all,test]'
pip install '.[all,test]' 'lm_eval @ git+https://github.com/EleutherAI/lm-evaluation-harness.git@115206dc89dad67b8b'
displayName: 'Install dependencies'
- script: |
pip uninstall -y torchvision torchaudio
pip install --pre 'nvfuser-cu121[torch]' --extra-index-url https://pypi.nvidia.com
displayName: 'Install PyTorch nightly'
- bash: |
set -e
pip list
Expand Down
21 changes: 12 additions & 9 deletions .github/workflows/cpu-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ defaults:

env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
UV_HTTP_TIMEOUT: 500

jobs:
cpu-tests:
Expand All @@ -33,20 +34,22 @@ jobs:
timeout-minutes: 25

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: |
pyproject.toml

- name: Install uv
run: pip install uv

- name: Install minimal dependencies
run: |
pip install .
pip list
# uv pip install . is not yet supported, only `-e .`
# https://github.com/astral-sh/uv/issues/1896
uv pip install --system -e .
uv pip list
# make sure all modules are still importable with only the minimal dependencies available
modules=$(
find litgpt -type f -name "*.py" | \
Expand All @@ -58,8 +61,8 @@ jobs:
- name: Install all dependencies
run: |
pip install '.[all,test]'
pip list
uv pip install --system -e '.[all,test]' 'lm_eval @ git+https://github.com/EleutherAI/lm-evaluation-harness.git@115206dc89dad67b8b'
uv pip list
- name: Run tests
run: |
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ __pycache__
.DS_Store
*.egg-info
build
dist
.venv
.vscode

Expand Down
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@

 [The latest model weights](tutorials/download_model_weights.md): Gemma, Mistral, Mixtral, Phi 2, Llama 2, Falcon, CodeLlama, and [many more](tutorials/download_model_weights.md).

 Optimized and efficient code: Flash Attention v2, multi-GPU support via fully-sharded data parallelism, [optional CPU offloading](tutorials/oom.md#do-sharding-across-multiple-gpus), and [TPU and XLA support](./xla).
 Optimized and efficient code: Flash Attention v2, multi-GPU support via fully-sharded data parallelism, [optional CPU offloading](tutorials/oom.md#do-sharding-across-multiple-gpus), and [TPU and XLA support](extensions/xla).

 [Pretraining](tutorials/pretraining.md), [finetuning](tutorials/finetuning.md), and [inference](tutorials/inference.md) in various precision settings: FP32, FP16, BF16, and FP16/FP32 mixed.
 [Pretraining](tutorials/pretrain_tinyllama.md), [finetuning](tutorials/finetune.md), and [inference](tutorials/inference.md) in various precision settings: FP32, FP16, BF16, and FP16/FP32 mixed.

 [Configuration files](config_hub) for great out-of-the-box performance.

Expand All @@ -51,11 +51,11 @@
The following [Lightning Studio](https://lightning.ai/lightning-ai/studios) templates provide LitGPT tutorials and projects in reproducible environments with multi-GPU and multi-node support:


| | |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| <p align="left">[Prepare the TinyLlama 1T token dataset](https://lightning.ai/lightning-ai/studios/prepare-the-tinyllama-1t-token-dataset) <br> [<img src="./images/3.webp" width="300"></p>](https://lightning.ai/lightning-ai/studios/prepare-the-tinyllama-1t-token-dataset) | [Pretrain LLMs - TinyLlama 1.1B](https://lightning.ai/lightning-ai/studios/pretrain-llms-tinyllama-1-1b) <br> <p align="left">[<img src="./images/4.webp" width="300"></p>](https://lightning.ai/lightning-ai/studios/pretrain-llms-tinyllama-1-1b) |
| [Continued Pretraining with TinyLlama 1.1B](https://lightning.ai/lightning-ai/studios/continued-pretraining-with-tinyllama-1-1b) <br> <p align="left">[<img src="./images/1.webp" width="300"></p>](https://lightning.ai/lightning-ai/studios/continued-pretraining-with-tinyllama-1-1b) | [Instruction finetuning - TinyLlama 1.1B LLM](https://lightning.ai/lightning-ai/studios/instruction-finetuning-tinyllama-1-1b-llm) <br> <p align="left">[<img src="./images/2.webp" width="300"></p>](https://lightning.ai/lightning-ai/studios/instruction-finetuning-tinyllama-1-1b-llm) |
| | |
| | |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <p align="left">[Prepare the TinyLlama 1T token dataset](https://lightning.ai/lightning-ai/studios/prepare-the-tinyllama-1t-token-dataset) <br> [<img src="https://pl-public-data.s3.amazonaws.com/assets_litgpt/readme/3.webp" width="300"></p>](https://lightning.ai/lightning-ai/studios/prepare-the-tinyllama-1t-token-dataset) | [Pretrain LLMs - TinyLlama 1.1B](https://lightning.ai/lightning-ai/studios/pretrain-llms-tinyllama-1-1b) <br> <p align="left">[<img src="https://pl-public-data.s3.amazonaws.com/assets_litgpt/readme/4.webp" width="300"></p>](https://lightning.ai/lightning-ai/studios/pretrain-llms-tinyllama-1-1b) |
| [Continued Pretraining with TinyLlama 1.1B](https://lightning.ai/lightning-ai/studios/continued-pretraining-with-tinyllama-1-1b) <br> <p align="left">[<img src="https://pl-public-data.s3.amazonaws.com/assets_litgpt/readme/1.webp" width="300"></p>](https://lightning.ai/lightning-ai/studios/continued-pretraining-with-tinyllama-1-1b) | [Instruction finetuning - TinyLlama 1.1B LLM](https://lightning.ai/lightning-ai/studios/instruction-finetuning-tinyllama-1-1b-llm) <br> <p align="left">[<img src="https://pl-public-data.s3.amazonaws.com/assets_litgpt/readme/2.webp" width="300"></p>](https://lightning.ai/lightning-ai/studios/instruction-finetuning-tinyllama-1-1b-llm) |
| | |



Expand Down Expand Up @@ -134,14 +134,14 @@ LitGPT also allows users to use configuration files in YAML format instead of sp

```bash
litgpt finetune lora \
--config https://github.com/Lightning-AI/litgpt/blob/wip/config_hub/finetune/llama-2-7b/lora.yaml
--config https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/finetune/llama-2-7b/lora.yaml
```

For added convenience, you can also manually override config file setting via the CLI:


```bash
litgpt finetune lora
litgpt finetune lora \
--config https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/finetune/llama-2-7b/lora.yaml \
--lora_r 4
```
Expand Down Expand Up @@ -344,7 +344,7 @@ helping democratize AI for millions of developers and researchers worldwide.
Using TPUs with Lightning is as straightforward as changing one line of code.
We provide scripts fully optimized for TPUs in the [XLA directory](xla)
We provide scripts fully optimized for TPUs in the [XLA directory](extensions/xla).
Expand All @@ -366,16 +366,17 @@ This implementation extends on [Lit-LLaMA](https://github.com/lightning-AI/lit-l
## Community showcase
Checkout the projects below using and building on LitGPT. If you have a project you'd like to add to our this section, please don't hestiate to open a pull request.
Check out the projects below using and building on LitGPT. If you have a project you'd like to add to this section, please don't hestiate to open a pull request.
&nbsp;
**🏆 NeurIPS 2023 Large Language Model Efficiency Challenge: 1 LLM + 1 GPU + 1 Day**
The LitGPT repository was the official starter kit for the [NeurIPS 2023 LLM Efficiency Challenge](https://llm-efficiency-challenge.github.io), which is a competition focused on finetuning an existing non-instruction tuned LLM for 24 hours on a single GPU.
&nbsp;
**TinyLlama: An Open-Source Small Language Model**
**🦙 TinyLlama: An Open-Source Small Language Model**
LitGPT powered the [TinyLlama project](https://github.com/jzhang38/TinyLlama) and [TinyLlama: An Open-Source Small Language Model](https://arxiv.org/abs/2401.02385) research paper.
Expand All @@ -400,4 +401,3 @@ If you use LitGPT in your research, please cite the following work:
## License

LitGPT is released under the [Apache 2.0](https://github.com/Lightning-AI/litgpt/blob/main/LICENSE) license.

66 changes: 59 additions & 7 deletions config_hub/finetune/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,63 @@
## Config files

The table below lists the performances you can expect from the provided config files. Note that you can achieve lower memory consumption by lowering the micro batch size as needed. In addition, you can lower the rank (`lora_r`) in the LoRA configuration files and disable LoRA for certain layers (for example, setting `lora_projection` and other LoRA layer-specific parameters to `false`).
The table below lists the performances you can expect from the provided config files. Note that you can achieve lower memory consumption by lowering the micro batch size as needed. In addition, you can lower the rank (`lora_r`) in the LoRA configuration files and disable LoRA for certain layers (for example, setting `lora_projection` and other LoRA layer-specific parameters to `false`).
For more information, see the [Dealing with out-of-memory (OOM) errors](../../tutorials/oom.md) on lowering the memory requirements.

| | Size | Dataset | Epochs | Val loss | Peak memory | Max seq length | Micro batch size | Precision | Training runtime |
| --------------------- | ---- | --------- | ------ | -------- | ----------- | -------------- | ---------------- | --------- | ---------------- |
| tiny-llama/lora.yaml | 1.1B | Alpaca 2k | 3 | 1.038 | 13.50 GB | 512 | 8 | bfloat16 | 8.06 min (A10G) |
| tiny-llama/qlora.yaml | 1.1B | Alpaca 2k | 3 | 1.056 | 16.24 GB | 512 | 8 | bfloat16 | 8.74 min (A10G) |
| tiny-llama/full.yaml | 1.1B | Alpaca 2k | 1 | 1.105 | 14.10 GB | 512 | 4 | bfloat16 | 2.59 min (A10G) |
| llama-2-7b/qlora.yaml | 7B | Alpaca 2k | 4 | 0.814 | 13.68 GB | 512 | 2 | bfloat16 | 45.68 min (A10G) |
&nbsp;

| | Size | Dataset | Epochs | Val loss | Peak memory | Max seq length | Micro batch size | Precision | Training runtime |
| --------------------------------- | ---- | --------- | ------ | -------- | ----------- | -------------- | ---------------- | --------- | -------------------|
| | | | | | | | | | |
| falcon-7b/lora.yaml | 7B | Alpaca 2k | 4 | 0.945 | 16.69 GB | 512 | 2 | bfloat16 | 24.88 min (1xA10G) |
| falcon-7b/qlora.yaml | 7B | Alpaca 2k | 4 | 0.993 | 9.44 GB | 512 | 2 | bfloat16 | 50.76 min (1xA10G) |
| | | | | | | | | | |
| gemma-2b/lora.yaml | 2B | Alpaca 2k | 2 | 1.476 | 12.62 GB | 512 | 2 | bfloat16 | 9.29 min (1xA10G) |
| gemma-2b/qlora.yaml | 2B | Alpaca 2k | 2 | 0.981 | 11.59 GB | 512 | 2 | bfloat16 | 12.90 min (1xA10G) |
| gemma-2b/full.yaml | 2B | Alpaca 2k | 0.35 | 0.990 | 17.43 GB | 512 | 1 | bfloat16 | 13.61 min (4xA10G) |
| | | | | | | | | | |
| gemma-7b/lora.yaml | 7B | Alpaca 2k | 2 | 0.903 | 25.30 GB | 512 | 1 | bfloat16 | 11.47 min (1xA100) |
| gemma-7b/qlora.yaml | 7B | Alpaca 2k | 2 | 0.951 | 17.31 GB | 512 | 1 | bfloat16 | 23.46 min (1xA100) |
| | | | | | | | | | |
| llama-2-7b/lora.yaml | 7B | Alpaca 2k | 4 | 0.802 | 19.77 GB | 512 | 2 | bfloat16 | 32.75 min (A10G) |
| llama-2-7b/qlora.yaml | 7B | Alpaca 2k | 4 | 0.814 | 13.68 GB | 512 | 2 | bfloat16 | 45.68 min (A10G) |
| llama-2-7b/full.yaml | 7B | Alpaca 2k | 1 | 0.941 | 26.81 GB | 512 | 4 | bfloat16 | 1.78 min (4xA100) |
| | | | | | | | | | |
| mistral-7b/lora.yaml | 7B | Alpaca 2k | 4 | 0.796 | 20.65 GB | 512 | 2 | bfloat16 | 31.04 min (1xA10G) |
| mistral-7b/qlora.yaml | 7B | Alpaca 2k | 4 | 0.803 | 14.29 GB | 512 | 2 | bfloat16 | 44.69 min (1xA10G) |
| | | | | | | | | | |
| phi-2/lora.yaml | 2B | Alpaca 2k | 1 | 0.832 | 13.98 GB | 512 | 4 | bfloat16 | 3.82 min (1xA10G) |
| phi-2/qlora.yaml | 2B | Alpaca 2k | 1 | 0.846 | 14.27 GB | 512 | 4 | bfloat16 | 4.55 min (1xA10G) |
| phi-2/full.yaml | 2B | Alpaca 2k | 1 | 0.937 | 14.44 GB | 512 | 4 | bfloat16 | 13.00 min (1xA10G) |
| | | | | | | | | | |
| stablelm-base-alpha-3b/lora.yaml | 7B | Alpaca 2k | 4 | 1.367 | 8.58 GB | 512 | 2 | bfloat16 | 13.02 min (1xA10G) |
| stablelm-base-alpha-3b/qlora.yaml | 7B | Alpaca 2k | 4 | 1.392 | 5.24 GB | 512 | 2 | bfloat16 | 25.71 min (1xA10G) |
| stablelm-base-alpha-3b/full.yaml | 7B | Alpaca 2k | 1 | 1.494 | 21.23 GB | 512 | 1 | bfloat16 | 72.72 min (2xA10G) |
| | | | | | | | | | |
| tiny-llama/lora.yaml | 1.1B | Alpaca 2k | 3 | 1.038 | 13.50 GB | 512 | 8 | bfloat16 | 8.06 min (1xA10G) |
| tiny-llama/qlora.yaml | 1.1B | Alpaca 2k | 3 | 1.056 | 16.24 GB | 512 | 8 | bfloat16 | 8.74 min (1xA10G) |
| tiny-llama/full.yaml | 1.1B | Alpaca 2k | 1 | 1.105 | 14.10 GB | 512 | 4 | bfloat16 | 2.59 min (1xA10G) |

&nbsp;
## Extending the context length

If you require a longer sequence length than the one used in a given config file, you can either edit the `max_seq_length` in the config file or pass an additional argument when running the finetuning command, for example, `--max_seq_length 4096` to override the sequence length provided in the config file.

&nbsp;
## Training on GPUs without bfloat16 support

If you are training on GPUs without bfloat-16 support, you need to change the `precision` option to `16-true` (16-bit floating point precision) or `16-mixed` (16/32-bit mixed precision) training:

```bash
litgpt finetune lora \
--config config_hub/finetune/phi-2/lora.yaml \
--precision 16-true
```
or

```bash
litgpt finetune lora \
--config config_hub/finetune/phi-2/lora.yaml \
--precision 16-mixed
```

Note that `16-true` is more compute and memory-efficient, but it can sometimes lead to training convergence issues. In this case, it's recommended to use `16-mixed`.
Loading

0 comments on commit 69e3ec5

Please sign in to comment.