Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parameter naming issue with Mistral 7B v0.3 #1480

Closed
wants to merge 32 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
141c8bf
OptimizerArgs (#1409)
rasbt May 23, 2024
2174054
Pin back to main
carmocca May 23, 2024
daffef0
Fix optimizer init with fused=True (#1434)
carmocca May 23, 2024
66a797a
Fix learning rate calculation in pretrain (#1435)
rasbt May 23, 2024
dbf7542
Align readme (#1438)
rasbt May 24, 2024
1754a2b
Pin litdata (#1440)
rasbt May 24, 2024
19a0d7a
Fix README.md alignment (#1439)
rasbt May 24, 2024
221b7ef
Update README.md for one last time (#1442)
rasbt May 24, 2024
f6654e8
A more centered look (#1449)
rasbt May 28, 2024
3fa17fb
New CLI (#1437)
rasbt May 31, 2024
916775c
Update error message (#1453)
rasbt May 31, 2024
339cf43
Explain how to list all available models (#1455)
rasbt Jun 1, 2024
798d725
Detect tensor cores (#1456)
rasbt Jun 1, 2024
e567dbe
Check checkpoint_dir and add `checkpoints` to path (#1454)
rasbt Jun 4, 2024
fa88952
Add MicroLlama training support (#1457)
keeeeenw Jun 4, 2024
0f3bca7
Streaming for serving with chat's generate function (#1426)
rasbt Jun 4, 2024
8c7df82
Fix sequence length bug (#1462)
rasbt Jun 5, 2024
3e4fb84
Add `lr_warmup_steps`, `max_steps` values validation (#1460)
shenxiangzhuang Jun 5, 2024
fe443ba
Fix issue where path in merge_lora is overwritten (#1465)
rasbt Jun 6, 2024
9538d6a
Option to skip expensive final validation (#1372)
rasbt Jun 6, 2024
d657908
Allow batch size "auto" setting in evaluate (#1469)
rasbt Jun 7, 2024
7be2851
Warn users when there is a bnb mismatch (#1468)
rasbt Jun 7, 2024
67e9164
Allow batch argument with batch recomputation (#1470)
rasbt Jun 7, 2024
0bb34ab
LitGPT Python API draft (#1459)
rasbt Jun 7, 2024
8ca46d2
Bump version for PyPI release (#1476)
rasbt Jun 10, 2024
d2ba385
Update download_model_weights.md
rasbt Jun 11, 2024
3594142
bumb version to 0.4.1.dev0
rasbt Jun 11, 2024
ee9108f
Fix typos in Download Model Weights documentation (#1477)
rasbt Jun 11, 2024
97ef696
Merge remote-tracking branch 'upstream/main' into mistral-v0.3
davmacario Jun 12, 2024
bbe4cf4
fix: download safetensors weight mapping
davmacario Jun 12, 2024
4f5d7fd
fix: use safetensors weight mapping
davmacario Jun 12, 2024
42670fe
fix: correct extension update in file name
davmacario Jun 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,6 @@ wandb
events.out.tfevents*

# test artifacts from tests/test_readme.py
tests/custom_finetuning_dataset.json
tests/custom_texts
**/custom_finetuning_dataset.json
client.py
**/custom_texts/
107 changes: 55 additions & 52 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,11 @@

Uses the latest state-of-the-art techniques:

✅ flash attention     ✅ fp4/8/16/32     ✅ LoRA, QLoRA, Adapter (v1, v2)     ✅ FSDP     ✅ 1-1000+ GPUs/TPUs
<pre>
✅ flash attention ✅ fp4/8/16/32 ✅ LoRA, QLoRA, Adapter
✅ FSDP ✅ 1-1000+ GPUs/TPUs ✅ 20+ LLMs
</pre>


---

Expand Down Expand Up @@ -69,30 +73,34 @@ LitGPT has 🤯 **custom, from-scratch implementations** of [20+ LLMs](tutorials

| Model | Model size | Author | Reference |
|----|----|----|----|
| CodeGemma | 7B | Google | [Google Team, Google Deepmind](https://ai.google.dev/gemma/docs/codegemma) |
| Code Llama | 7B, 13B, 34B, 70B | Meta AI | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950) |
| Danube2 | 1.8B | H2O.ai | [H2O.ai](https://h2o.ai/platform/danube-1-8b/) |
| Dolly | 3B, 7B, 12B | Databricks | [Conover et al. 2023](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm) |
| Falcon | 7B, 40B, 180B | TII UAE | [TII 2023](https://falconllm.tii.ae) |
| FreeWilly2 (Stable Beluga 2) | 70B | Stability AI | [Stability AI 2023](https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models) |
| Function Calling Llama 2 | 7B | Trelis | [Trelis et al. 2023](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2) |
| Gemma | 2B, 7B | Google | [Google Team, Google Deepmind](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf) |
| Llama 2 | 7B, 13B, 70B | Meta AI | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288) |
| Llama 3 | 8B, 70B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3) |
| LongChat | 7B, 13B | LMSYS | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/) |
| Mixtral MoE | 8x7B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/mixtral-of-experts/) |
| Mistral | 7B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/announcing-mistral-7b/) |
| Nous-Hermes | 7B, 13B, 70B | NousResearch | [Org page](https://huggingface.co/NousResearch) |
| OpenLLaMA | 3B, 7B, 13B | OpenLM Research | [Geng & Liu 2023](https://github.com/openlm-research/open_llama) |
| Phi | 1.3B, 2.7B | Microsoft Research | [Li et al. 2023](https://arxiv.org/abs/2309.05463) |
| CodeGemma | 7B | Google | [Google Team, Google Deepmind](https://ai.google.dev/gemma/docs/codegemma) |
| Code Llama | 7B, 13B, 34B, 70B | Meta AI | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950) |
| Danube2 | 1.8B | H2O.ai | [H2O.ai](https://h2o.ai/platform/danube-1-8b/) |
| Dolly | 3B, 7B, 12B | Databricks | [Conover et al. 2023](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm) |
| Falcon | 7B, 40B, 180B | TII UAE | [TII 2023](https://falconllm.tii.ae) |
| FreeWilly2 (Stable Beluga 2) | 70B | Stability AI | [Stability AI 2023](https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models) |
| Function Calling Llama 2 | 7B | Trelis | [Trelis et al. 2023](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2) |
| Gemma | 2B, 7B | Google | [Google Team, Google Deepmind](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf) |
| Llama 2 | 7B, 13B, 70B | Meta AI | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288) |
| Llama 3 | 8B, 70B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3) |
| LongChat | 7B, 13B | LMSYS | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/) |
| MicroLlama | 300M | Ken Wang | [MicroLlama repo](https://github.com/keeeeenw/MicroLlama)
| Mixtral MoE | 8x7B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/mixtral-of-experts/) |
| Mistral | 7B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/announcing-mistral-7b/) |
| Nous-Hermes | 7B, 13B, 70B | NousResearch | [Org page](https://huggingface.co/NousResearch) |
| OpenLLaMA | 3B, 7B, 13B | OpenLM Research | [Geng & Liu 2023](https://github.com/openlm-research/open_llama) |
| Phi | 1.3B, 2.7B | Microsoft Research | [Li et al. 2023](https://arxiv.org/abs/2309.05463) |
| Platypus | 7B, 13B, 70B | Lee et al. | [Lee, Hunter, and Ruiz 2023](https://arxiv.org/abs/2308.07317) |
| Pythia | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | EleutherAI | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373) |
| RedPajama-INCITE | 3B, 7B | Together | [Together 2023](https://together.ai/blog/redpajama-models-v1) |
| StableCode | 3B | Stability AI | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding) |
| StableLM | 3B, 7B | Stability AI | [Stability AI 2023](https://github.com/Stability-AI/StableLM) |
| StableLM Zephyr | 3B | Stability AI | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding) |
| TinyLlama | 1.1B | Zhang et al. | [Zhang et al. 2023](https://github.com/jzhang38/TinyLlama) |
| Vicuna | 7B, 13B, 33B | LMSYS | [Li et al. 2023](https://lmsys.org/blog/2023-03-30-vicuna/)
| Pythia | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | EleutherAI | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373) |
| RedPajama-INCITE | 3B, 7B | Together | [Together 2023](https://together.ai/blog/redpajama-models-v1) |
| StableCode | 3B | Stability AI | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding) |
| StableLM | 3B, 7B | Stability AI | [Stability AI 2023](https://github.com/Stability-AI/StableLM) |
| StableLM Zephyr | 3B | Stability AI | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding) |
| TinyLlama | 1.1B | Zhang et al. | [Zhang et al. 2023](https://github.com/jzhang38/TinyLlama) |
| Vicuna | 7B, 13B, 33B | LMSYS | [Li et al. 2023](https://lmsys.org/blog/2023-03-30-vicuna/) |

**Tip**: You can list all available models by running the `litgpt download list` command.


</details>

Expand Down Expand Up @@ -138,7 +146,7 @@ litgpt serve meta-llama/Meta-Llama-3-8B-Instruct
&nbsp;

### Use an LLM for inference
Use LLMs for inference to test its chatting capabilities, run evaluations, or extract embeddings, etc...
Use LLMs for inference to test its chatting capabilities, run evaluations, or extract embeddings, etc.
Here's an example showing how to use the Phi-2 LLM.

<a target="_blank" href="https://lightning.ai/lightning-ai/studios/litgpt-chat">
Expand All @@ -148,12 +156,14 @@ Here's an example showing how to use the Phi-2 LLM.
&nbsp;

```bash
# 1) Download a pretrained model
litgpt download --repo_id microsoft/phi-2
# 1) List all available models in litgpt
litgpt download list

# 2) Chat with the model
litgpt chat \
--checkpoint_dir checkpoints/microsoft/phi-2
# 2) Download a pretrained model
litgpt download microsoft/phi-2

# 3) Chat with the model
litgpt chat microsoft/phi-2

>> Prompt: What do Llamas eat?
```
Expand All @@ -174,21 +184,19 @@ For more information on the different inference options, refer to the [inference

```bash
# 1) Download a pretrained model
litgpt download --repo_id microsoft/phi-2
litgpt download microsoft/phi-2

# 2) Finetune the model
curl -L https://huggingface.co/datasets/ksaw008/finance_alpaca/resolve/main/finance_alpaca.json -o my_custom_dataset.json

litgpt finetune \
--checkpoint_dir checkpoints/microsoft/phi-2 \
litgpt finetune microsoft/phi-2 \
--data JSON \
--data.json_path my_custom_dataset.json \
--data.val_split_fraction 0.1 \
--out_dir out/custom-model

# 3) Chat with the model
litgpt chat \
--checkpoint_dir out/custom-model/final
litgpt chat out/custom-model/final
```

&nbsp;
Expand All @@ -208,22 +216,19 @@ curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_text
curl https://www.gutenberg.org/cache/epub/26393/pg26393.txt --output custom_texts/book2.txt

# 1) Download a tokenizer
litgpt download \
--repo_id EleutherAI/pythia-160m \
litgpt download EleutherAI/pythia-160m \
--tokenizer_only True

# 2) Pretrain the model
litgpt pretrain \
--model_name pythia-160m \
--tokenizer_dir checkpoints/EleutherAI/pythia-160m \
litgpt pretrain EleutherAI/pythia-160m \
--tokenizer_dir EleutherAI/pythia-160m \
--data TextFiles \
--data.train_data_path "custom_texts/" \
--train.max_tokens 10_000_000 \
--out_dir out/custom-model

# 3) Chat with the model
litgpt chat \
--checkpoint_dir out/custom-model/final
litgpt chat out/custom-model/final
```

&nbsp;
Expand All @@ -244,21 +249,19 @@ curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_text
curl https://www.gutenberg.org/cache/epub/26393/pg26393.txt --output custom_texts/book2.txt

# 1) Download a pretrained model
litgpt download --repo_id EleutherAI/pythia-160m
litgpt download EleutherAI/pythia-160m

# 2) Continue pretraining the model
litgpt pretrain \
--model_name pythia-160m \
--tokenizer_dir checkpoints/EleutherAI/pythia-160m \
--initial_checkpoint_dir checkpoints/EleutherAI/pythia-160m \
litgpt pretrain EleutherAI/pythia-160m \
--tokenizer_dir EleutherAI/pythia-160m \
--initial_checkpoint_dir EleutherAI/pythia-160m \
--data TextFiles \
--data.train_data_path "custom_texts/" \
--train.max_tokens 10_000_000 \
--out_dir out/custom-model

# 3) Chat with the model
litgpt chat \
--checkpoint_dir out/custom-model/final
litgpt chat out/custom-model/final
```

&nbsp;
Expand All @@ -274,11 +277,11 @@ Once you're ready to deploy a finetuned LLM, run this command:

```bash
# locate the checkpoint to your finetuned or pretrained model and call the `serve` command:
litgpt serve --checkpoint_dir path/to/your/checkpoint/microsoft/phi-2
litgpt serve microsoft/phi-2

# Alternative: if you haven't finetuned, download any checkpoint to deploy it:
litgpt download --repo_id microsoft/phi-2
litgpt serve --checkpoint_dir checkpoints/microsoft/phi-2
litgpt download microsoft/phi-2
litgpt serve microsoft/phi-2
```

Test the server in a separate terminal and integrate the model API into your AI product:
Expand Down
33 changes: 21 additions & 12 deletions config_hub/finetune/falcon-7b/lora.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -84,18 +84,6 @@ train:
# Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
tie_embeddings:

# (type: float, default: 0.0003)
learning_rate: 0.0002

# (type: float, default: 0.02)
weight_decay: 0.0

# (type: float, default: 0.9)
beta1: 0.9

# (type: float, default: 0.95)
beta2: 0.95

# (type: Optional[float], default: null)
max_norm:

Expand All @@ -117,8 +105,29 @@ eval:
# Whether to evaluate on the validation set at the beginning of the training
initial_validation: false

# Whether to evaluate on the validation set at the end the training
final_validation: true

# The name of the logger to send metrics to. (type: Literal['wandb', 'tensorboard', 'csv'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:

class_path: torch.optim.AdamW

init_args:

# (type: float, default: 0.001)
lr: 0.0002

# (type: float, default: 0.01)
weight_decay: 0.0

# (type: tuple, default: (0.9,0.999))
betas:
- 0.9
- 0.95
33 changes: 21 additions & 12 deletions config_hub/finetune/falcon-7b/qlora.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -86,18 +86,6 @@ train:
# Whether to tie the embedding weights with the language modeling head weights (type: Optional[bool], default: null)
tie_embeddings:

# (type: float, default: 0.0003)
learning_rate: 0.0002

# (type: float, default: 0.02)
weight_decay: 0.0

# (type: float, default: 0.9)
beta1: 0.9

# (type: float, default: 0.95)
beta2: 0.95

# (type: Optional[float], default: null)
max_norm:

Expand All @@ -119,8 +107,29 @@ eval:
# Whether to evaluate on the validation set at the beginning of the training
initial_validation: false

# Whether to evaluate on the validation set at the end the training
final_validation: true

# The name of the logger to send metrics to. (type: Literal['wandb', 'tensorboard', 'csv'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:

class_path: torch.optim.AdamW

init_args:

# (type: float, default: 0.001)
lr: 0.0002

# (type: float, default: 0.01)
weight_decay: 0.0

# (type: tuple, default: (0.9,0.999))
betas:
- 0.9
- 0.95
33 changes: 21 additions & 12 deletions config_hub/finetune/gemma-2b/full.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,18 +55,6 @@ train:
# Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
tie_embeddings:

# (type: float, default: 0.0003)
learning_rate: 0.0002

# (type: float, default: 0.02)
weight_decay: 0.0

# (type: float, default: 0.9)
beta1: 0.9

# (type: float, default: 0.95)
beta2: 0.95

# (type: Optional[float], default: null)
max_norm:

Expand All @@ -88,8 +76,29 @@ eval:
# Whether to evaluate on the validation set at the beginning of the training
initial_validation: false

# Whether to evaluate on the validation set at the end the training
final_validation: true

# The name of the logger to send metrics to. (type: Literal['wandb', 'tensorboard', 'csv'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:

class_path: torch.optim.AdamW

init_args:

# (type: float, default: 0.001)
lr: 0.0002

# (type: float, default: 0.01)
weight_decay: 0.0

# (type: tuple, default: (0.9,0.999))
betas:
- 0.9
- 0.95
33 changes: 21 additions & 12 deletions config_hub/finetune/gemma-2b/lora.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -85,18 +85,6 @@ train:
# Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
tie_embeddings:

# (type: float, default: 0.0003)
learning_rate: 0.0002

# (type: float, default: 0.02)
weight_decay: 0.2

# (type: float, default: 0.9)
beta1: 0.9

# (type: float, default: 0.95)
beta2: 0.95

# (type: Optional[float], default: null)
max_norm:

Expand All @@ -118,8 +106,29 @@ eval:
# Whether to evaluate on the validation set at the beginning of the training
initial_validation: false

# Whether to evaluate on the validation set at the end the training
final_validation: true

# The name of the logger to send metrics to. (type: Literal['wandb', 'tensorboard', 'csv'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:

class_path: torch.optim.AdamW

init_args:

# (type: float, default: 0.001)
lr: 0.0002

# (type: float, default: 0.01)
weight_decay: 0.0

# (type: tuple, default: (0.9,0.999))
betas:
- 0.9
- 0.95
Loading