Skip to content

Commit

Permalink
Tutorials
Browse files Browse the repository at this point in the history
  • Loading branch information
carmocca committed Jan 18, 2024
1 parent 2096bde commit 0f53faf
Show file tree
Hide file tree
Showing 3 changed files with 70 additions and 23 deletions.
41 changes: 21 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,26 +26,27 @@ Hackable [implementation](lit_gpt/model.py) of state-of-the-art open-source larg

Supports the following popular model checkpoints:

| Model and usage | Model size | Reference |
|-----------------------------------------------------------------------------------|------------------------------------------|--------------------------------------------------------------------------------------------------|
| EleutherAI [Pythia](tutorials/download_pythia.md) | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373) |
| LMSYS [LongChat](tutorials/download_longchat.md) | 7B, 13B | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/) |
| LMSYS [Vicuna](tutorials/download_vicuna.md) | 7B, 13B, 33B | [Li et al. 2023](https://lmsys.org/blog/2023-03-30-vicuna/) |
| Meta AI [Code Llama](tutorials/download_code_llama.md) | 7B, 13B, 34B | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950) |
| Meta AI [Llama 2](tutorials/download_llama_2.md) | 7B, 13B, 70B | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288) |
| Mistral AI [Mistral and Mixtral](tutorials/download_mistral.md) | 7B | [Mistral website](https://mistral.ai/) |
| Microsoft Research [Phi](tutorials/download_phi.md) | 1.3B, 2.7B | [Li et al. 2023](https://arxiv.org/abs/2309.05463) |
| NousResearch Nous-Hermes | 7B, 13B, 70B | [Org page](https://huggingface.co/NousResearch) |
| OpenLM Research [OpenLLaMA](tutorials/download_openllama.md) | 3B, 7B, 13B | [Geng & Liu 2023](https://github.com/openlm-research/open_llama) |
| Platypus | 7B, 13B, 70B | [Lee, Hunter, and Ruiz 2023](https://arxiv.org/abs/2308.07317) |
| Stability AI StableCode | 3B | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding) |
| Stability AI [FreeWilly2](tutorials/download_freewilly_2.md) (Stable Beluga 2) | 70B | [Stability AI 2023](https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models) |
| Stability AI [StableLM](tutorials/download_stablelm.md) | 3B, 7B | [Stability AI 2023](https://github.com/Stability-AI/StableLM) |
| Stability AI [StableLM Zephyr](tutorials/download_stablelm.md) | 3B | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding) |
| TII UAE [Falcon](tutorials/download_falcon.md) | 7B, 40B, 180B | [TII 2023](https://falconllm.tii.ae) |
| [TinyLlama](tutorials/download_tinyllama.md) | 1.1B | [Zhang et al. 2023](https://github.com/jzhang38/TinyLlama) |
| Together [RedPajama-INCITE](tutorials/download_redpajama_incite.md) | 3B, 7B | [Together 2023](https://together.ai/blog/redpajama-models-v1) |
| Trelis [Function Calling Llama 2](tutorials/download_function_calling_llama_2.md) | 7B | [Trelis et al. 2023](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2) |
| Model and usage | Model size | Reference |
|-----------------------------------------------------------------------------------|------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
| EleutherAI [Pythia](tutorials/download_pythia.md) | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373) |
| LMSYS [LongChat](tutorials/download_longchat.md) | 7B, 13B | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/) |
| LMSYS [Vicuna](tutorials/download_vicuna.md) | 7B, 13B, 33B | [Li et al. 2023](https://lmsys.org/blog/2023-03-30-vicuna/) |
| Meta AI [Code Llama](tutorials/download_code_llama.md) | 7B, 13B, 34B | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950) |
| Meta AI [Llama 2](tutorials/download_llama_2.md) | 7B, 13B, 70B | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288) |
| Mistral AI [Mistral and Mixtral](tutorials/download_mistral.md) | 7B | [Mistral website](https://mistral.ai/) |
| Microsoft Research [Phi](tutorials/download_phi.md) | 1.3B, 2.7B | [Li et al. 2023](https://arxiv.org/abs/2309.05463) |
| NousResearch Nous-Hermes | 7B, 13B, 70B | [Org page](https://huggingface.co/NousResearch) |
| OpenLM Research [OpenLLaMA](tutorials/download_openllama.md) | 3B, 7B, 13B | [Geng & Liu 2023](https://github.com/openlm-research/open_llama) |
| Platypus | 7B, 13B, 70B | [Lee, Hunter, and Ruiz 2023](https://arxiv.org/abs/2308.07317) |
| Stability AI StableCode | 3B | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding) |
| Stability AI [FreeWilly2](tutorials/download_freewilly_2.md) (Stable Beluga 2) | 70B | [Stability AI 2023](https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models) |
| Stability AI [StableLM](tutorials/download_stablelm.md) | 3B, 7B | [Stability AI 2023](https://github.com/Stability-AI/StableLM) |
| Stability AI [StableLM Zephyr](tutorials/download_stablelm.md) | 3B | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding) |
| TII UAE [Falcon](tutorials/download_falcon.md) | 7B, 40B, 180B | [TII 2023](https://falconllm.tii.ae) |
| [TinyLlama](tutorials/download_tinyllama.md) | 1.1B | [Zhang et al. 2023](https://github.com/jzhang38/TinyLlama) |
| Together [RedPajama-INCITE](tutorials/download_redpajama_incite.md) | 3B, 7B | [Together 2023](https://together.ai/blog/redpajama-models-v1) |
| Trelis [Function Calling Llama 2](tutorials/download_function_calling_llama_2.md) | 7B | [Trelis et al. 2023](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2) |
| databricks [Dolly](tutorials/download_dolly.md) | 3B, 7B, 12B | [Conover et al. 2023](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm) |

This implementation extends on [Lit-LLaMA](https://github.com/lightning-AI/lit-llama) and [nanoGPT](https://github.com/karpathy/nanoGPT), and it's **powered by [Lightning Fabric](https://lightning.ai/docs/fabric/stable/)**.

Expand Down
7 changes: 4 additions & 3 deletions lit_gpt/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -294,10 +294,10 @@ def norm_class(self) -> Type:
configs.append(copy)


####################################
###################
# databricks Dolly
####################################
dolly_v2 = [
###################
dolly = [
# https://huggingface.co/databricks/dolly-v2-3b/blob/main/config.json
dict(
name="dolly-v2-3b",
Expand Down Expand Up @@ -325,6 +325,7 @@ def norm_class(self) -> Type:
n_head=40,
),
]
configs.extend(dolly)


####################################
Expand Down
45 changes: 45 additions & 0 deletions tutorials/download_dolly.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
## Download [Dolly](https://github.com/databrickslabs/dolly) weights

Databricks’ [Dolly](https://huggingface.co/databricks/dolly-v2-12b) is an instruction-following large language model trained on the Databricks machine learning platform
that is licensed for commercial use. Based on `pythia-12b`, Dolly is trained on ~15k instruction/response fine tuning records
[`databricks-dolly-15k`](https://huggingface.co/datasets/databricks/databricks-dolly-15k) generated
by Databricks employees in capability domains from the InstructGPT paper, including brainstorming, classification, closed QA, generation,
information extraction, open QA and summarization. `dolly-v2-12b` is not a state-of-the-art model, but does exhibit surprisingly
high quality instruction following behavior not characteristic of the foundation model on which it is based.

For detailed info on the models, their training, and their behavior, please see the [Dolly repository](https://github.com/databrickslabs/dolly).

To see all the available checkpoints for Pythia, run:

```bash
python scripts/download.py | grep dolly
```

which will print

```text
databricks/dolly-v2-3b
databricks/dolly-v2-7b
databricks/dolly-v2-12b
```

In order to use a specific Dolly checkpoint, for instance [dolly-v2-3b](https://huggingface.co/databricks/dolly-v2-3b), download the weights and convert the checkpoint to the lit-gpt format:

```bash
pip install huggingface_hub

python scripts/download.py --repo_id databricks/dolly-v2-3b

python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/databricks/dolly-v2-3b
```

By default, the convert_hf_checkpoint step will use the data type of the HF checkpoint's parameters. In cases where RAM
or disk size is constrained, it might be useful to pass `--dtype bfloat16` to convert all parameters into this smaller precision before continuing.

You're done! To execute the model just run:

```bash
pip install tokenizers

python generate/base.py --prompt "Hello, my name is" --checkpoint_dir checkpoints/databricks/dolly-v2-3b
```

0 comments on commit 0f53faf

Please sign in to comment.