Tutorials

Lightning-AI · Jan 18, 2024 · 82cb624 · 82cb624
1 parent 2096bde
commit 82cb624
Show file tree

Hide file tree

Showing 3 changed files with 70 additions and 23 deletions.
diff --git a/README.md b/README.md
@@ -26,26 +26,27 @@ Hackable [implementation](lit_gpt/model.py) of state-of-the-art open-source larg
 
 Supports the following popular model checkpoints:
 
-| Model and usage                                                                   | Model size                               | Reference                                                                                        |
-|-----------------------------------------------------------------------------------|------------------------------------------|--------------------------------------------------------------------------------------------------|
-| EleutherAI [Pythia](tutorials/download_pythia.md)                                 | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373)                                         |
-| LMSYS [LongChat](tutorials/download_longchat.md)                                  | 7B, 13B                                  | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/)                                |
-| LMSYS [Vicuna](tutorials/download_vicuna.md)                                      | 7B, 13B, 33B                             | [Li et al. 2023](https://lmsys.org/blog/2023-03-30-vicuna/)                                      |
-| Meta AI [Code Llama](tutorials/download_code_llama.md)                            | 7B, 13B, 34B                             | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950)                                          |
-| Meta AI [Llama 2](tutorials/download_llama_2.md)                                  | 7B, 13B, 70B                             | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288)                                          |
-| Mistral AI [Mistral and Mixtral](tutorials/download_mistral.md)                   | 7B                                       | [Mistral  website](https://mistral.ai/)                                                          |
-| Microsoft Research [Phi](tutorials/download_phi.md)                               | 1.3B, 2.7B                               | [Li et al. 2023](https://arxiv.org/abs/2309.05463)                                               |
-| NousResearch Nous-Hermes                                                          | 7B, 13B, 70B                             | [Org page](https://huggingface.co/NousResearch)                                                  |
-| OpenLM Research [OpenLLaMA](tutorials/download_openllama.md)                      | 3B, 7B, 13B                              | [Geng & Liu 2023](https://github.com/openlm-research/open_llama)                                 |
-| Platypus                                                                          | 7B, 13B, 70B                             | [Lee, Hunter, and Ruiz 2023](https://arxiv.org/abs/2308.07317)                                   |
-| Stability AI StableCode                                                           | 3B                                       | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)               |
-| Stability AI [FreeWilly2](tutorials/download_freewilly_2.md) (Stable Beluga 2)    | 70B                                      | [Stability AI 2023](https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models) |
-| Stability AI [StableLM](tutorials/download_stablelm.md)                           | 3B, 7B                                   | [Stability AI 2023](https://github.com/Stability-AI/StableLM)                                    |
-| Stability AI [StableLM Zephyr](tutorials/download_stablelm.md)                    | 3B                                       | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)               |
-| TII UAE [Falcon](tutorials/download_falcon.md)                                    | 7B, 40B, 180B                            | [TII 2023](https://falconllm.tii.ae)                                                             |
-| [TinyLlama](tutorials/download_tinyllama.md)                                      | 1.1B                                     | [Zhang et al. 2023](https://github.com/jzhang38/TinyLlama)                                       |
-| Together [RedPajama-INCITE](tutorials/download_redpajama_incite.md)               | 3B, 7B                                   | [Together 2023](https://together.ai/blog/redpajama-models-v1)                                    |
-| Trelis [Function Calling Llama 2](tutorials/download_function_calling_llama_2.md) | 7B                                       | [Trelis et al. 2023](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2)       |
+| Model and usage                                                                   | Model size                               | Reference                                                                                                                    |
+|-----------------------------------------------------------------------------------|------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
+| EleutherAI [Pythia](tutorials/download_pythia.md)                                 | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373)                                                                     |
+| LMSYS [LongChat](tutorials/download_longchat.md)                                  | 7B, 13B                                  | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/)                                                            |
+| LMSYS [Vicuna](tutorials/download_vicuna.md)                                      | 7B, 13B, 33B                             | [Li et al. 2023](https://lmsys.org/blog/2023-03-30-vicuna/)                                                                  |
+| Meta AI [Code Llama](tutorials/download_code_llama.md)                            | 7B, 13B, 34B                             | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950)                                                                      |
+| Meta AI [Llama 2](tutorials/download_llama_2.md)                                  | 7B, 13B, 70B                             | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288)                                                                      |
+| Mistral AI [Mistral and Mixtral](tutorials/download_mistral.md)                   | 7B                                       | [Mistral  website](https://mistral.ai/)                                                                                      |
+| Microsoft Research [Phi](tutorials/download_phi.md)                               | 1.3B, 2.7B                               | [Li et al. 2023](https://arxiv.org/abs/2309.05463)                                                                           |
+| NousResearch Nous-Hermes                                                          | 7B, 13B, 70B                             | [Org page](https://huggingface.co/NousResearch)                                                                              |
+| OpenLM Research [OpenLLaMA](tutorials/download_openllama.md)                      | 3B, 7B, 13B                              | [Geng & Liu 2023](https://github.com/openlm-research/open_llama)                                                             |
+| Platypus                                                                          | 7B, 13B, 70B                             | [Lee, Hunter, and Ruiz 2023](https://arxiv.org/abs/2308.07317)                                                               |
+| Stability AI StableCode                                                           | 3B                                       | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)                                           |
+| Stability AI [FreeWilly2](tutorials/download_freewilly_2.md) (Stable Beluga 2)    | 70B                                      | [Stability AI 2023](https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models)                             |
+| Stability AI [StableLM](tutorials/download_stablelm.md)                           | 3B, 7B                                   | [Stability AI 2023](https://github.com/Stability-AI/StableLM)                                                                |
+| Stability AI [StableLM Zephyr](tutorials/download_stablelm.md)                    | 3B                                       | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)                                           |
+| TII UAE [Falcon](tutorials/download_falcon.md)                                    | 7B, 40B, 180B                            | [TII 2023](https://falconllm.tii.ae)                                                                                         |
+| [TinyLlama](tutorials/download_tinyllama.md)                                      | 1.1B                                     | [Zhang et al. 2023](https://github.com/jzhang38/TinyLlama)                                                                   |
+| Together [RedPajama-INCITE](tutorials/download_redpajama_incite.md)               | 3B, 7B                                   | [Together 2023](https://together.ai/blog/redpajama-models-v1)                                                                |
+| Trelis [Function Calling Llama 2](tutorials/download_function_calling_llama_2.md) | 7B                                       | [Trelis et al. 2023](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2)                                   |
+| databricks [Dolly](tutorials/download_dolly.md)                                   | 3B, 7B, 12B                              | [Conover et al. 2023](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm) |
 
 This implementation extends on [Lit-LLaMA](https://github.com/lightning-AI/lit-llama) and [nanoGPT](https://github.com/karpathy/nanoGPT), and it's **powered by [Lightning Fabric](https://lightning.ai/docs/fabric/stable/) ⚡**.
 

diff --git a/lit_gpt/config.py b/lit_gpt/config.py
@@ -294,10 +294,10 @@ def norm_class(self) -> Type:
     configs.append(copy)
 
 
-####################################
+###################
 # databricks Dolly
-####################################
-dolly_v2 = [
+###################
+dolly = [
     # https://huggingface.co/databricks/dolly-v2-3b/blob/main/config.json
     dict(
         name="dolly-v2-3b",
@@ -325,6 +325,7 @@ def norm_class(self) -> Type:
         n_head=40,
     ),
 ]
+configs.extend(dolly)
 
 
 ####################################

diff --git a/tutorials/download_dolly.md b/tutorials/download_dolly.md
@@ -0,0 +1,45 @@
+## Download [Dolly](https://github.com/databrickslabs/dolly) weights
+
+Databricks’ [Dolly](https://huggingface.co/databricks/dolly-v2-12b) is an instruction-following large language model trained on the Databricks machine learning platform
+that is licensed for commercial use. Based on `pythia-12b`, Dolly is trained on ~15k instruction/response fine tuning records
+[`databricks-dolly-15k`](https://huggingface.co/datasets/databricks/databricks-dolly-15k) generated
+by Databricks employees in capability domains from the InstructGPT paper, including brainstorming, classification, closed QA, generation,
+information extraction, open QA and summarization. `dolly-v2-12b` is not a state-of-the-art model, but does exhibit surprisingly
+high quality instruction following behavior not characteristic of the foundation model on which it is based.
+
+For detailed info on the models, their training, and their behavior, please see the [Dolly repository](https://github.com/databrickslabs/dolly).
+
+To see all the available checkpoints for Dolly, run:
+
+```bash
+python scripts/download.py | grep dolly
+```
+
+which will print
+
+```text
+databricks/dolly-v2-3b
+databricks/dolly-v2-7b
+databricks/dolly-v2-12b
+```
+
+In order to use a specific Dolly checkpoint, for instance [dolly-v2-3b](https://huggingface.co/databricks/dolly-v2-3b), download the weights and convert the checkpoint to the lit-gpt format:
+
+```bash
+pip install huggingface_hub
+
+python scripts/download.py --repo_id databricks/dolly-v2-3b
+
+python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/databricks/dolly-v2-3b
+```
+
+By default, the convert_hf_checkpoint step will use the data type of the HF checkpoint's parameters. In cases where RAM
+or disk size is constrained, it might be useful to pass `--dtype bfloat16` to convert all parameters into this smaller precision before continuing.
+
+You're done! To execute the model just run:
+
+```bash
+pip install tokenizers
+
+python generate/base.py --prompt "Hello, my name is" --checkpoint_dir checkpoints/databricks/dolly-v2-3b
+```