diff --git a/README.md b/README.md index f8fe3f23d4..b464bcfe3a 100644 --- a/README.md +++ b/README.md @@ -26,28 +26,29 @@ Hackable [implementation](litgpt/model.py) of state-of-the-art open-source large Supports the following popular model checkpoints: -| Model | Model size | Reference | -|--------------------------------------------------------------------------------------|------------------------------------------|------------------------------------------------------------------------------------------------------------------------------| -| [Code Llama](tutorials/download_code_llama.md) by Meta AI | 7B, 13B, 34B, 70B | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950) | -| [Dolly](tutorials/download_dolly.md) by Databricks | 3B, 7B, 12B | [Conover et al. 2023](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm) | -| [Falcon](tutorials/download_falcon.md) by TII UAE | 7B, 40B, 180B | [TII 2023](https://falconllm.tii.ae) | -| [FreeWilly2](tutorials/download_freewilly_2.md) (Stable Beluga 2) by Stability AI | 70B | [Stability AI 2023](https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models) | -| [Function Calling Llama 2](tutorials/download_function_calling_llama_2.md) by Trelis | 7B | [Trelis et al. 2023](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2) | -| [Gemma](tutorials/download_gemma.md) by Google | 2B, 7B | [Google Team, Google Deepmind](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf) | -| [Llama 2](tutorials/download_llama_2.md) by Meta AI | 7B, 13B, 70B | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288) | -| [LongChat](tutorials/download_longchat.md) by LMSYS | 7B, 13B | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/) | -| [Mistral and Mixtral](tutorials/download_mistral.md) by Mistral AI | 7B | [Mistral website](https://mistral.ai/) | -| [Nous-Hermes](https://huggingface.co/NousResearch/Nous-Hermes-13b) by NousResearch | 7B, 13B, 70B | [Org page](https://huggingface.co/NousResearch) | -| [OpenLLaMA](tutorials/download_openllama.md) by OpenLM Research | 3B, 7B, 13B | [Geng & Liu 2023](https://github.com/openlm-research/open_llama) | -| [Phi](tutorials/download_phi.md) by Microsoft Research | 1.3B, 2.7B | [Li et al. 2023](https://arxiv.org/abs/2309.05463) | -| [Platypus](https://huggingface.co/garage-bAInd/Platypus-30B) by Lee at el. | 7B, 13B, 70B | [Lee, Hunter, and Ruiz 2023](https://arxiv.org/abs/2308.07317) | -| [Pythia](tutorials/download_pythia.md) by EleutherAI | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373) | -| [RedPajama-INCITE](tutorials/download_redpajama_incite.md) by Together | 3B, 7B | [Together 2023](https://together.ai/blog/redpajama-models-v1) | -| [StableCode](tutorials/download_stablecode.md) by Stability AI | 3B | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding) | -| [StableLM](tutorials/download_stablelm.md) by Stability AI | 3B, 7B | [Stability AI 2023](https://github.com/Stability-AI/StableLM) | -| [StableLM Zephyr](tutorials/download_stablelm.md) by Stability AI | 3B | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding) | -| [TinyLlama](tutorials/download_tinyllama.md) by Zhang et al. | 1.1B | [Zhang et al. 2023](https://github.com/jzhang38/TinyLlama) | -| [Vicuna](tutorials/download_vicuna.md) by LMSYS | 7B, 13B, 33B | [Li et al. 2023](https://lmsys.org/blog/2023-03-30-vicuna/) | +| Model | Model size | Reference | +|------------------------------------------------------------------------------------|------------------------------------------|------------------------------------------------------------------------------------------------------------------------------| +| Code Llama by Meta AI | 7B, 13B, 34B, 70B | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950) | +| Dolly by Databricks | 3B, 7B, 12B | [Conover et al. 2023](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm) | +| Falcon by TII UAE | 7B, 40B, 180B | [TII 2023](https://falconllm.tii.ae) | +| FreeWilly2 (Stable Beluga 2) by Stability AI | 70B | [Stability AI 2023](https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models) | +| Function Calling Llama 2 by Trelis | 7B | [Trelis et al. 2023](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2) | +| Gemma by Google | 2B, 7B | [Google Team, Google Deepmind](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf) | +| Llama 2 by Meta AI | 7B, 13B, 70B | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288) | +| LongChat by LMSYS | 7B, 13B | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/) | +| Mistral and Mixtral by Mistral AI | 7B | [Mistral website](https://mistral.ai/) | +| Nous-Hermes by NousResearch | 7B, 13B, 70B | [Org page](https://huggingface.co/NousResearch) | +| OpenLLaMA by OpenLM Research | 3B, 7B, 13B | [Geng & Liu 2023](https://github.com/openlm-research/open_llama) | +| Phi by Microsoft Research | 1.3B, 2.7B | [Li et al. 2023](https://arxiv.org/abs/2309.05463) | +| Platypus by Lee at el. | 7B, 13B, 70B | [Lee, Hunter, and Ruiz 2023](https://arxiv.org/abs/2308.07317) | +| Pythia by EleutherAI | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373) | +| RedPajama-INCITE by Together | 3B, 7B | [Together 2023](https://together.ai/blog/redpajama-models-v1) | +| StableCode by Stability AI | 3B | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding) | +| StableLM by Stability AI | 3B, 7B | [Stability AI 2023](https://github.com/Stability-AI/StableLM) | +| StableLM Zephyr by Stability AI | 3B | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding) | +| TinyLlama by Zhang et al. | 1.1B | [Zhang et al. 2023](https://github.com/jzhang38/TinyLlama) | +| Vicuna by LMSYS | 7B, 13B, 33B | [Li et al. 2023](https://lmsys.org/blog/2023-03-30-vicuna/) | + This implementation extends on [Lit-LLaMA](https://github.com/lightning-AI/lit-llama) and [nanoGPT](https://github.com/karpathy/nanoGPT), and it's **powered by [Lightning Fabric](https://lightning.ai/docs/fabric/stable/) ⚡**. @@ -96,7 +97,7 @@ pip install 'litgpt[all]' ## Use the model -To generate text predictions, you need to download the model weights. **If you don't have them, check out our [guide](tutorials/download_stablelm.md).** +To generate text predictions, you need to download the model weights. **If you don't have them, check out our [guide](tutorials/download_model_weights.md).** Run inference: diff --git a/tutorials/convert_lit_models.md b/tutorials/convert_lit_models.md index beba3b32e4..2a1a798ab5 100644 --- a/tutorials/convert_lit_models.md +++ b/tutorials/convert_lit_models.md @@ -47,7 +47,7 @@ model = AutoModel.from_pretrained("online_repo_id", state_dict=state_dict) Please note that if you want to convert a model that has been fine-tuned using an adapter like LoRA, these weights should be [merged](../litgpt/scripts/merge_lora.py) to the checkpoint prior to converting. ```sh -python scripts/merge_lora.py \ +python litgpt/scripts/merge_lora.py \ --checkpoint_dir path/to/lora/checkpoint_dir ``` @@ -94,7 +94,7 @@ python litgpt/finetune/lora.py \ Note that this step only applies if the model was finetuned with `lora.py` above and not when `full.py` was used for finetuning. ```bash -python scripts/merge_lora.py \ +python litgpt/scripts/merge_lora.py \ --checkpoint_dir $finetuned_dir/final ``` diff --git a/tutorials/download_code_llama.md b/tutorials/download_code_llama.md deleted file mode 100644 index 2fc4fda50d..0000000000 --- a/tutorials/download_code_llama.md +++ /dev/null @@ -1,53 +0,0 @@ -## Download [Code Llama](https://ai.meta.com/blog/code-llama-large-language-model-coding/) weights - -Meta developed and publicly released the Code Llama family of large language models (LLMs) on top of Llama 2. - -Code Llama models come in three sizes: 7B, 13B, and 34B parameter models. Furthermore, there are three model versions for each size: - -- Code Llama: A base model trained on 500B tokens, then and finetuned on 20B tokens. -- Code Llama-Python: The Code Llama model pretrained on 500B tokens, further trained on 100B additional Python code tokens, and then finetuned on 20B tokens. -- Code Llama-Instruct: The Code Llama model trained on 500B tokens, finetuned on 20B tokens, and instruction-finetuned on additional 5B tokens. - -All models were trained on 16,000 token contexts and support generations with up to 100,000 tokens of context. - -To see all the available checkpoints, run: - -```bash -python litgpt/scripts/download.py | grep CodeLlama -``` - -which will print - -```text -codellama/CodeLlama-7b-hf -codellama/CodeLlama-13b-hf -codellama/CodeLlama-34b-hf -codellama/CodeLlama-70b-hf -codellama/CodeLlama-7b-Python-hf -codellama/CodeLlama-13b-Python-hf -codellama/CodeLlama-34b-Python-hf -codellama/CodeLlama-70b-Python-hf -codellama/CodeLlama-7b-Instruct-hf -codellama/CodeLlama-13b-Instruct-hf -codellama/CodeLlama-34b-Instruct-hf -codellama/CodeLlama-70b-Instruct-hf -``` - -In order to use a specific checkpoint, for instance [CodeLlama-7b-Python-hf](https://huggingface.co/codellama/CodeLlama-7b-Python-hf), download the weights and convert the checkpoint to the litgpt format. - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id codellama/CodeLlama-7b-Python-hf -``` - -By default, the checkpoint conversion step will use the data type of the HF checkpoint's parameters. In cases where RAM -or disk size is constrained, it might be useful to pass `--dtype bfloat16` to convert all parameters into this smaller precision before continuing. - -You're done! To execute the model just run: - -```bash -pip install sentencepiece - -python litgpt/chat/base.py --checkpoint_dir checkpoints/codellama/CodeLlama-7b-Python-hf/ -``` diff --git a/tutorials/download_dolly.md b/tutorials/download_dolly.md deleted file mode 100644 index a54911cc22..0000000000 --- a/tutorials/download_dolly.md +++ /dev/null @@ -1,43 +0,0 @@ -## Download [Dolly](https://github.com/databrickslabs/dolly) weights - -Databricks’ [Dolly](https://huggingface.co/databricks/dolly-v2-12b) is an instruction-following large language model trained on the Databricks machine learning platform -that is licensed for commercial use. Based on `pythia-12b`, Dolly is trained on ~15k instruction/response fine tuning records -[`databricks-dolly-15k`](https://huggingface.co/datasets/databricks/databricks-dolly-15k) generated -by Databricks employees in capability domains from the InstructGPT paper, including brainstorming, classification, closed QA, generation, -information extraction, open QA and summarization. `dolly-v2-12b` is not a state-of-the-art model, but does exhibit surprisingly -high quality instruction following behavior not characteristic of the foundation model on which it is based. - -For detailed info on the models, their training, and their behavior, please see the [Dolly repository](https://github.com/databrickslabs/dolly). - -To see all the available checkpoints for Dolly, run: - -```bash -python litgpt/scripts/download.py | grep dolly -``` - -which will print - -```text -databricks/dolly-v2-3b -databricks/dolly-v2-7b -databricks/dolly-v2-12b -``` - -In order to use a specific Dolly checkpoint, for instance [dolly-v2-3b](https://huggingface.co/databricks/dolly-v2-3b), download the weights and convert the checkpoint to the litgpt format: - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id databricks/dolly-v2-3b -``` - -By default, the checkpoint conversion step will use the data type of the HF checkpoint's parameters. In cases where RAM -or disk size is constrained, it might be useful to pass `--dtype bfloat16` to convert all parameters into this smaller precision before continuing. - -You're done! To execute the model just run: - -```bash -pip install tokenizers - -python litgpt/generate/base.py --prompt "Hello, my name is" --checkpoint_dir checkpoints/databricks/dolly-v2-3b -``` diff --git a/tutorials/download_falcon.md b/tutorials/download_falcon.md deleted file mode 100644 index f9e0ca0095..0000000000 --- a/tutorials/download_falcon.md +++ /dev/null @@ -1,43 +0,0 @@ -## Download [Falcon](https://falconllm.tii.ae) weights - -UAE's Technology Innovation Institute has open-sourced Falcon LLM. -It is trained on [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) enhanced with curated corpora - Weights are released under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0). - -The first Falcon release includes a base model and an instruction tuned model of sizes 7B and 40B called `falcon-7b-instruct` and `falcon-40b-instruct`. Recently, checkpoints for 180B parameter models were added as well; the 180B instruction tuned model is called `falcon-180B-chat` and similar to the `falcon-40b-instruct` architecture except for its larger size. - -To see all the available checkpoints for Falcon, run: - -```bash -python litgpt/scripts/download.py | grep falcon -``` - -which will print - -```text -tiiuae/falcon-7b -tiiuae/falcon-7b-instruct -tiiuae/falcon-40b -tiiuae/falcon-40b-instruct -tiiuae/falcon-180B -tiiuae/falcon-180B-chat -``` - -In order to use a specific Falcon checkpoint, for instance [falcon-7b](https://huggingface.co/tiiuae/falcon-7b), download the weights and convert the checkpoint to the litgpt format: - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id tiiuae/falcon-7b -``` - -By default, the checkpoint conversion step will use the data type of the HF checkpoint's parameters. In cases where RAM -or disk size is constrained, it might be useful to pass `--dtype bfloat16` to convert all parameters into this smaller precision before continuing. - -You're done! To execute the model just run: - -```bash -pip install tokenizers - -python litgpt/generate/base.py --prompt "Hello, my name is" --checkpoint_dir checkpoints/tiiuae/falcon-7b -``` diff --git a/tutorials/download_freewilly_2.md b/tutorials/download_freewilly_2.md deleted file mode 100644 index 491bc6431b..0000000000 --- a/tutorials/download_freewilly_2.md +++ /dev/null @@ -1,22 +0,0 @@ - -## Download [FreeWilly 2](https://stability.ai/blog/freewilly-large-instruction-fine-tuned-models) weights - -Stability AI announced FreeWilly inspired by the methodology pioneered by Microsoft in its paper: "Orca: Progressive Learning from Complex Explanation Traces of GPT-4”. -FreeWilly2 leverages the Llama 2 70B foundation model to reach a performance that compares favorably with GPT-3.5 for some tasks. - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id stabilityai/FreeWilly2 -``` - -By default, the checkpoint conversion step will use the data type of the HF checkpoint's parameters. In cases where RAM -or disk size is constrained, it might be useful to pass `--dtype bfloat16` to convert all parameters into this smaller precision before continuing. - -You're done! To execute the model just run: - -```bash -pip install sentencepiece - -python litgpt/chat/base.py --checkpoint_dir checkpoints/stabilityai/FreeWilly2 -``` diff --git a/tutorials/download_function_calling_llama_2.md b/tutorials/download_function_calling_llama_2.md deleted file mode 100644 index 1d09d1ffa7..0000000000 --- a/tutorials/download_function_calling_llama_2.md +++ /dev/null @@ -1,30 +0,0 @@ -## Download [Function Calling Llama 2](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2) weights - -Llama-7B with function calling is licensed according to the Meta Community license. - -Function calling Llama extends the hugging face Llama 2 models with function calling capabilities. -The model responds with a structured json argument with the function name and arguments. - -In order to use the checkpoint, download the weights and convert the checkpoint to the litgpt format. - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id Trelis/Llama-2-7b-chat-hf-function-calling-v2 -``` - -By default, the checkpoint conversion step will use the data type of the HF checkpoint's parameters. In cases where RAM -or disk size is constrained, it might be useful to pass `--dtype bfloat16` to convert all parameters into this smaller precision before continuing. - -You're done! To execute the model just run: - -```bash -pip install sentencepiece - -python litgpt/chat/base.py --checkpoint_dir Trelis/Llama-2-7b-chat-hf-function-calling-v2 -``` -Is strongly recommended to visit the model [repository](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2) to know how to format the prompt. - -The chat script has a generic use case with a single function defined, feel free to play with it to fit your needs, for instance to make HTTP requests with the model outputs. - -Have fun! diff --git a/tutorials/download_gemma.md b/tutorials/download_gemma.md deleted file mode 100644 index ec16fa4bb9..0000000000 --- a/tutorials/download_gemma.md +++ /dev/null @@ -1,43 +0,0 @@ -## Download [Gemma](https://blog.google/technology/developers/gemma-open-models/) weights - -Google developed and publicly released the Gemma large language models (LLMs), a collection of pretrained models in 2B and 7B parameter size that are based on the Gemini architecture. - -For more information, please see the [technical report](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf). - - -To see all the available checkpoints, run: - -```bash -python litgpt/scripts/download.py | grep gemma -``` - -which will print - -```text -google/gemma-7b -google/gemma-2b -google/gemma-7b-it -google/gemma-2b-it -``` - -In the list above, `gemma-2b` and `gemma-7b` are the pretrained models, and `gemma-2b-it` and `gemma-7b-it` are the instruction-finetuned models. - -In order to use a specific checkpoint, for instance [gemma-2b](https://huggingface.co/google/gemma-2b), download the weights and convert the checkpoint to the litgpt format. - -This requires that you've been granted access to the weights on the HuggingFace hub. You can do so by following the steps at . -After access is granted, you can find your HF hub token in . - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id google/gemma-2b --access_token your_hf_token -``` - -By default, the checkpoint conversion step will use the data type of the HF checkpoint's parameters. In cases where RAM -or disk size is constrained, it might be useful to pass `--dtype bfloat16` to convert all parameters into this smaller precision before continuing. - -You're done! To execute the model just run: - -```bash -python litgpt/chat/base.py --checkpoint_dir checkpoints/google/gemma-2b -``` diff --git a/tutorials/download_llama_2.md b/tutorials/download_llama_2.md deleted file mode 100644 index d03c054954..0000000000 --- a/tutorials/download_llama_2.md +++ /dev/null @@ -1,50 +0,0 @@ -## Download [Llama 2](https://ai.meta.com/llama) weights - -Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and -fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Its fine-tuned LLMs, -called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on -most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular -closed-source models like ChatGPT and PaLM. - -Llama 2 models are trained on 2 trillion tokens (40% more data than LLaMA 1) and have double the context length of LLaMA 1 (4096 tokens). - -Llama 2 comes in a range of parameter sizes — 7B, 13B, and 70B — as well as pretrained and fine-tuned variations. - -To see all the available checkpoints, run: - -```bash -python litgpt/scripts/download.py | grep Llama-2 -``` - -which will print - -```text -meta-llama/Llama-2-7b-hf -meta-llama/Llama-2-7b-chat-hf -meta-llama/Llama-2-13b-hf -meta-llama/Llama-2-13b-chat-hf -meta-llama/Llama-2-70b-hf -meta-llama/Llama-2-70b-chat-hf -``` - -In order to use a specific checkpoint, for instance [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), download the weights and convert the checkpoint to the litgpt format. - -This requires that you've been granted access to the weights on the HuggingFace hub. You can do so by following the steps at . -After access is granted, you can find your HF hub token in . - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id meta-llama/Llama-2-7b-chat-hf --access_token your_hf_token -``` - -By default, the checkpoint conversion step will use the data type of the HF checkpoint's parameters. In cases where RAM -or disk size is constrained, it might be useful to pass `--dtype bfloat16` to convert all parameters into this smaller precision before continuing. - -You're done! To execute the model just run: - -```bash -pip install sentencepiece - -python litgpt/chat/base.py --checkpoint_dir checkpoints/meta-llama/Llama-2-7b-chat-hf -``` diff --git a/tutorials/download_longchat.md b/tutorials/download_longchat.md deleted file mode 100644 index 1daf604b87..0000000000 --- a/tutorials/download_longchat.md +++ /dev/null @@ -1,36 +0,0 @@ -## Download [LongChat](https://lmsys.org/blog/2023-06-29-longchat) weights - -LongChat is an open-source family of chatbots based on LLaMA featuring an extended context length up to 16K tokens. -The technique used to extend the context length is described in [this blogpost](https://kaiokendev.github.io/context). - -To see all the available checkpoints, run: - -```bash -python litgpt/scripts/download.py | grep longchat -``` - -which will print - -```text -lmsys/longchat-7b-16k -lmsys/longchat-13b-16k -``` - -In order to use a specific checkpoint, for instance [longchat-7b-16k](https://huggingface.co/lmsys/longchat-7b-16k), download the weights and convert the checkpoint to the litgpt format: - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id lmsys/longchat-7b-16k -``` - -By default, the checkpoint conversion step will use the data type of the HF checkpoint's parameters. In cases where RAM -or disk size is constrained, it might be useful to pass `--dtype bfloat16` to convert all parameters into this smaller precision before continuing. - -You're done! To execute the model just run: - -```bash -pip install sentencepiece - -python litgpt/chat/base.py --checkpoint_dir checkpoints/lmsys/longchat-7b-16k -``` diff --git a/tutorials/download_mistral.md b/tutorials/download_mistral.md deleted file mode 100644 index f95a76a6b3..0000000000 --- a/tutorials/download_mistral.md +++ /dev/null @@ -1,72 +0,0 @@ -## Download [Mistral](https://mistral.ai) weights - -### Mistral - -[Mistral 7B](https://mistral.ai/news/announcing-mistral-7b) is Apache 2.0 licensed and can be used without restrictions. It: - -* Outperforms Llama 2 13B on all benchmarks -* Outperforms Llama 1 34B on many benchmarks -* Approaches CodeLlama 7B performance on code, while remaining good at English tasks -* Uses Grouped-query attention (GQA) for faster inference -* ~~Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost~~. - This project's implementation does not use Sliding Window Attention, so the context length is limited to 4096 tokens. - -Details about the data used to train the model or training procedure have not been made public. - -To see all the available checkpoints, run: - -```bash -python litgpt/scripts/download.py | grep -E 'Mistral|Mixtral' -``` - -which will print - -```text -mistralai/Mistral-7B-v0.1 -mistralai/Mistral-7B-Instruct-v0.1 -mistralai/Mixtral-8x7B-v0.1 -mistralai/Mixtral-8x7B-Instruct-v0.1 -mistralai/Mistral-7B-Instruct-v0.2 -``` - -In order to use the Mistral 7B model checkpoint, which requires about 14 GB of disk space, download the weights and convert the checkpoint to the litgpt format: - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id mistralai/Mistral-7B-Instruct-v0.2 -``` - -You're done! To execute the model just run: - -```bash -pip install sentencepiece - -python litgpt/chat/base.py --checkpoint_dir checkpoints/mistralai/Mistral-7B-Instruct-v0.2 -``` - -### Mixtral - -[Mixtral 8x7B](https://mistral.ai/news/mixtral-of-experts) is a pretrained generative Sparse Mixture of Experts model based on Mistral 7B. -Mistral-8x7B outperforms Llama 2 70B on most benchmarks tested. - -Details about the data used to train the model or training procedure have not been made public. - -In order to use the Mixtral 7B model checkpoint, which requires about 94 GB of disk space, download the weights and convert the checkpoint to the litgpt format: - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id mistralai/Mixtral-8x7B-Instruct-v0.1 -``` - -Due to the size of the model, currently only the multi-device sequential generation script can handle it. - -```bash -pip install sentencepiece - -python litgpt/generate/sequentially.py --checkpoint_dir checkpoints/mistralai/Mixtral-8x7B-Instruct-v0.1 -``` - -You will need enough devices (2, 4, or 8) where their combined memory is higher than 94 GB to fit the model in memory. -Please check out [this section](inference.md#run-a-large-model-on-multiple-smaller-devices) for more information about this script. diff --git a/tutorials/download_model_weights.md b/tutorials/download_model_weights.md new file mode 100644 index 0000000000..936eea8a8a --- /dev/null +++ b/tutorials/download_model_weights.md @@ -0,0 +1,251 @@ +# Download Model Weights with LitGPT + +LitGPT supports a variety of LLM architectures with publicly available weights. You can download model weights and access a list of supported models using the LitGPT `download.py` script. + +  +## General Instructions + + +### 1. List Available Models + +To see all supported models, run the following command without arguments: + +```bash +python litgpt/scripts/download.py +``` + +The output is shown below: + +``` +stabilityai/stablelm-base-alpha-3b +stabilityai/stablelm-base-alpha-7b +stabilityai/stablelm-tuned-alpha-3b +stabilityai/stablelm-tuned-alpha-7b +stabilityai/stablelm-3b-4e1t +stabilityai/stablelm-zephyr-3b +stabilityai/stablecode-completion-alpha-3b +stabilityai/stablecode-completion-alpha-3b-4k +stabilityai/stablecode-instruct-alpha-3b +stabilityai/stable-code-3b +EleutherAI/pythia-14m +EleutherAI/pythia-31m +EleutherAI/pythia-70m +EleutherAI/pythia-160m +EleutherAI/pythia-410m +EleutherAI/pythia-1b +EleutherAI/pythia-1.4b +EleutherAI/pythia-2.8b +EleutherAI/pythia-6.9b +EleutherAI/pythia-12b +EleutherAI/pythia-70m-deduped +EleutherAI/pythia-160m-deduped +EleutherAI/pythia-410m-deduped +EleutherAI/pythia-1b-deduped +EleutherAI/pythia-1.4b-deduped +EleutherAI/pythia-2.8b-deduped +EleutherAI/pythia-6.9b-deduped +EleutherAI/pythia-12b-deduped +databricks/dolly-v2-3b +databricks/dolly-v2-7b +databricks/dolly-v2-12b +togethercomputer/RedPajama-INCITE-Base-3B-v1 +togethercomputer/RedPajama-INCITE-Chat-3B-v1 +togethercomputer/RedPajama-INCITE-Instruct-3B-v1 +togethercomputer/RedPajama-INCITE-7B-Base +togethercomputer/RedPajama-INCITE-7B-Chat +togethercomputer/RedPajama-INCITE-7B-Instruct +togethercomputer/RedPajama-INCITE-Base-7B-v0.1 +togethercomputer/RedPajama-INCITE-Chat-7B-v0.1 +togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1 +tiiuae/falcon-7b +tiiuae/falcon-7b-instruct +tiiuae/falcon-40b +tiiuae/falcon-40b-instruct +tiiuae/falcon-180B +tiiuae/falcon-180B-chat +openlm-research/open_llama_3b +openlm-research/open_llama_7b +openlm-research/open_llama_13b +lmsys/vicuna-7b-v1.3 +lmsys/vicuna-13b-v1.3 +lmsys/vicuna-33b-v1.3 +lmsys/vicuna-7b-v1.5 +lmsys/vicuna-7b-v1.5-16k +lmsys/vicuna-13b-v1.5 +lmsys/vicuna-13b-v1.5-16k +lmsys/longchat-7b-16k +lmsys/longchat-13b-16k +NousResearch/Nous-Hermes-llama-2-7b +NousResearch/Nous-Hermes-13b +NousResearch/Nous-Hermes-Llama2-13b +meta-llama/Llama-2-7b-hf +meta-llama/Llama-2-7b-chat-hf +meta-llama/Llama-2-13b-hf +meta-llama/Llama-2-13b-chat-hf +meta-llama/Llama-2-70b-hf +meta-llama/Llama-2-70b-chat-hf +google/gemma-2b +google/gemma-7b +google/gemma-2b-it +google/gemma-7b-it +stabilityai/FreeWilly2 +codellama/CodeLlama-7b-hf +codellama/CodeLlama-13b-hf +codellama/CodeLlama-34b-hf +codellama/CodeLlama-70b-hf +codellama/CodeLlama-7b-Python-hf +codellama/CodeLlama-13b-Python-hf +codellama/CodeLlama-34b-Python-hf +codellama/CodeLlama-70b-Python-hf +codellama/CodeLlama-7b-Instruct-hf +codellama/CodeLlama-13b-Instruct-hf +codellama/CodeLlama-34b-Instruct-hf +codellama/CodeLlama-70b-Instruct-hf +garage-bAInd/Platypus-30B +garage-bAInd/Platypus2-7B +garage-bAInd/Platypus2-13B +garage-bAInd/Platypus2-70B +garage-bAInd/Camel-Platypus2-13B +garage-bAInd/Camel-Platypus2-70B +garage-bAInd/Stable-Platypus2-13B +garage-bAInd/Platypus2-70B-instruct +togethercomputer/LLaMA-2-7B-32K +microsoft/phi-1_5 +microsoft/phi-2 +mistralai/Mistral-7B-v0.1 +mistralai/Mistral-7B-Instruct-v0.1 +mistralai/Mixtral-8x7B-v0.1 +mistralai/Mixtral-8x7B-Instruct-v0.1 +mistralai/Mistral-7B-Instruct-v0.2 +TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T +TinyLlama/TinyLlama-1.1B-Chat-v1.0 +Trelis/Llama-2-7b-chat-hf-function-calling-v2 +``` + +  +### 2. Download Model Weights + +To download the weights for a specific model, use the `--repo_id` argument. Replace `` with the model's repository ID. For example: + +```bash +python litgpt/scripts/download.py --repo_id +``` +This command downloads the model checkpoint into the `checkpoints/` directory. + +  +### 3. Additional Help + + +For more options, add the `--help` flag when running the script: + +```bash +python litgpt/scripts/download.py --help +``` + +  +### 4. Run the Model + +After conversion, run the model with the `--checkpoint_dir` flag, adjusting `repo_id` accordingly: + +```bash +python litgpt/chat/base.py --checkpoint_dir checkpoints/ +``` + +  +## Tinyllama Example + +This section shows a typical end-to-end example for downloading and using TinyLlama: + +1. List available TinyLlama checkpoints: + +```bash +python litgpt/scripts/download.py | grep Tiny +``` + +``` +TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T +TinyLlama/TinyLlama-1.1B-Chat-v1.0 +``` + +2. Download a TinyLlama checkpoint: + +```bash +export repo_id=TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T +python litgpt/scripts/download.py --repo_id $repo_id +``` + +3. Use the TinyLlama model: + +```bash +python litgpt/chat/base.py --checkpoint_dir checkpoints/$repo_id +``` + +  +## Specific Models + +Note that certain models require that you've been granted access to the weights on the HuggingFace Hub. + +For example, to get access to the Gemma 2B model, you can do so by following the steps at https://huggingface.co/google/gemma-2b. After access is granted, you can find your HF hub token in https://huggingface.co/settings/tokens. + +Once you've been granted access and obtained the access token you need to pass the additional `--access_token`: + +```bash +python litgpt/scripts/download.py \ + --repo_id google/gemma-2b \ + --access_token your_hf_token +``` + + +  +## Tips for GPU Memory Limitations + +The `download.py` script will automatically convert the downloaded model checkpoint into a LitGPT-compatible format. In case this conversion fails due to GPU memory constraints, you can try to reduce the memory requirements by passing the `--dtype bf16-true` flag to convert all parameters into this smaller precision (however, note that most model weights are already in a bfloat16 format, so it may not have any effect): + + +```bash +python litgpt/scripts/download.py \ + --repo_id + --dtype bf16-true +``` + +(If your GPU does not support the bfloat16 format, you can also try a regular 16-bit float format via `--dtype 16-true`.) + +  +## Converting Checkpoints Manually + +For development purposes, for example, when adding or experimenting with new model configurations, it may be beneficial to split the weight download and model conversion into two separate steps. + +You can do this by passing the `--convert_checkpoint false` option to the download script: + +```bash +python litgpt/scripts/download.py \ + --repo_id \ + --convert_checkpoint false +``` + +and then calling the `convert_hf_checkpoint.py` script: + +```bash +python litgpt/scripts/convert_hf_checkpoint.py \ + --checkpoint_dir checkpoint_dir/ +``` + +  +## Downloading Tokenizers Only + +In some cases we don't need the model weight, for example, when we are pretraining a model from scratch instead of finetuning it. For cases like this, you can use the `--tokenizer_only` flag to only download a model's tokenizer, which can then be used in the pretraining scripts: + +```bash +python litgpt/scripts/download.py \ + --repo_id TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T \ + --tokenizer_only true +``` + +and + +```bash +python litgpt/pretrain.py \ + --data ... \ + --model_name tiny-llama-1.1b \ + --tokenizer_dir checkpoints/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T/ +``` \ No newline at end of file diff --git a/tutorials/download_openllama.md b/tutorials/download_openllama.md deleted file mode 100644 index 652ea39122..0000000000 --- a/tutorials/download_openllama.md +++ /dev/null @@ -1,38 +0,0 @@ -## Download [OpenLLaMA](https://github.com/openlm-research/open_llama) weights - -OpenLLaMA is a permissively licensed open source reproduction of [Meta AI’s LLaMA](https://github.com/facebookresearch/llama) -7B and 13B checkpoints trained on the [RedPajama dataset](https://github.com/togethercomputer/RedPajama-Data). -The weights can serve as the drop in replacement of LLaMA in existing implementations. We also provide a smaller 3B variant. - -To see all the available checkpoints for Open LLaMA, run: - -```bash -python litgpt/scripts/download.py | grep open_llama -``` - -which will print - -```text -openlm-research/open_llama_3b -openlm-research/open_llama_7b -openlm-research/open_llama_13b -``` - -In order to use a specific OpenLLaMA checkpoint, for instance [open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b), download the weights and convert the checkpoint to the litgpt format: - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id openlm-research/open_llama_3b -``` - -By default, the convert_hf_checkpoint step will use the data type of the HF checkpoint's parameters. In cases where RAM -or disk size is constrained, it might be useful to pass `--dtype bfloat16` to convert all parameters into this smaller precision before continuing. - -You're done! To execute the model just run: - -```bash -pip install sentencepiece - -python litgpt/generate/base.py --prompt "Hello, my name is" --checkpoint_dir checkpoints/openlm-research/open_llama_3b -``` diff --git a/tutorials/download_phi.md b/tutorials/download_phi.md deleted file mode 100644 index d5d22495f9..0000000000 --- a/tutorials/download_phi.md +++ /dev/null @@ -1,73 +0,0 @@ -## Download [phi](https://arxiv.org/abs/2309.05463) weights - -### Phi 2 - -Microsoft Research [released](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/) Phi 2, which is a 2.7 billion parameter model trained on "textbook-quality" data with knowledge distillation from Phi 1.5. The model achieves sota results among base LLMs with less than 13B parameters and matches or outperforms models up to 25x larger on complex benchmarks, e.g. it achieves better performance compared to 25x larger Llama-2-70B model on multi-step reasoning tasks, i.e., coding and math. Phi 2 was trained on 1.4T tokens and has not undergone any RLHF alignment nor has it been instruct fine-tuned. Phi 2 shares the same architecture with Phi 1.5 and has context length of 2048 tokens. -The model weights are released under [*Microsoft Research license*](https://huggingface.co/microsoft/phi-2#license). - -To download the model weights and convert them to the litgpt format, run - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id microsoft/phi-2 -``` - -> [!WARNING] -> Phi-2 used [dropout](https://huggingface.co/microsoft/phi-2/blob/cb2f453/config.json#L26) during training which we don't model, so training will not be equal. - -Inference the model in instruct mode: - -```bash -python litgpt/chat/base.py --checkpoint_dir checkpoints/microsoft/phi-2 -``` -```text ->> Prompt: Write a detailed analogy between mathematics and a lighthouse. ->> Reply: Mathematics is like a lighthouse. Mathematics provides a method to guide us through the sometimes chaotic and confusing waters of life. It provides a structured approach to problems which can help us find our way and provide direction. Just as a lighthouse keeps watch over the sea, mathematics can provide us with the tools to try and make sense of the world. Furthermore, just as a lighthouse keeps a watchful eye on the horizon, mathematics can help us reach our goals by showing us the way. -``` - -> [!NOTE] -> In order to obtain appropriate answers, you may need to tweak the [input prompt](https://github.com/Lightning-AI/litgpt/blob/wip/litgpt/prompts.py#L252). E.g. we found out that if using `"Instruct:{prompt}\nOutput:\n"` instead of `"Instruct:{prompt}\nOutput:"` the model generates longer answers in some cases. - -Free generation mode: -```bash -python litgpt/generate/base.py --prompt "Alice: I don't know why, I'm struggling to maintain focus while studying. Any suggestions?\nBob:" --checkpoint_dir checkpoints/microsoft/phi-2 -``` -which yields -```text -Alice: I don't know why, I'm struggling to maintain focus while studying. Any suggestions? -Bob: Well, one possible reason could be stress. Have you been feeling overwhelmed lately? -Alice: Yes, I've been juggling multiple deadlines and it's been quite taxing. -Carol: Stress can definitely impact your ability to concentrate. Maybe you need -``` - -### Phi 1.5 - -A team at Microsoft Research has made available Phi 1.5, which is a 1.3 billion parameter model optimized for common sense reasoning in natural language, showing performance on par with models 5x its size, especially in grade-school mathematics and basic coding. This model retains characteristics of larger LLMs, and significant improvement was noted in reducing toxic and biased generations by avoiding web data. It's also worth highlighting that while this model performs well on language understanding and common sense reasoning tasks, it is a base model that has not undergone any supervised instruction finetuning or finetuning with RLHF. - -The model was trained the same data sources (7B tokens) as its [phi-1](https://arxiv.org/abs/2306.11644) predecessor, which includes - -- a Python code subset from [The Stack](https://arxiv.org/abs/2211.15533) v1.2 -- Q&A texts from [StackOverflow](https://archive.org/download/stackexchange) -- code from DeepMind [code_contests](https://github.com/deepmind/code_contests) -- synthetic Python textbooks and exercises generated by [gpt-3.5-turbo-0301](https://platform.openai.com/docs/models/gpt-3-5) - -In addition, to create phi-1.5, the authors included additional textbook-quality synthetic text (roughly 20B tokens) in natural language, which was created using the [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) approach. - -The model weights are released under a [*Microsoft Research license*](https://huggingface.co/microsoft/phi-1_5/blob/main/README.md#license). - -In order to use the phi-1.5 model checkpoint, which requires about 3 Gb of disk space, download the weights and convert the checkpoint to the litgpt format: - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id microsoft/phi-1_5 -``` - -You're done! To execute the model just run: - -```bash -pip install tokenizers - -python litgpt/generate/base.py --prompt "Hello, my name is" --checkpoint_dir checkpoints/microsoft/phi-1_5 -``` diff --git a/tutorials/download_pythia.md b/tutorials/download_pythia.md deleted file mode 100644 index aebcd496df..0000000000 --- a/tutorials/download_pythia.md +++ /dev/null @@ -1,54 +0,0 @@ -## Download [Pythia](https://github.com/EleutherAI/pythia) weights - -EleutherAI's project Pythia combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers. Weights are released under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0). - -For detailed info on the models, their training, and their behavior, please see the [Pythia repository](https://github.com/EleutherAI/pythia). -It includes a suite of 8 checkpoints (weights) on 2 different datasets: [The Pile](https://pile.eleuther.ai/), as well as The Pile with deduplication applied. In addition there are two small models that come only in non-deduplicated form: `Pythia-14m` and `Pythia-31m`. - -To see all the available checkpoints for Pythia, run: - -```bash -python litgpt/scripts/download.py | grep pythia -``` - -which will print - -```text -EleutherAI/pythia-14m -EleutherAI/pythia-31m -EleutherAI/pythia-70m -EleutherAI/pythia-160m -EleutherAI/pythia-410m -EleutherAI/pythia-1b -EleutherAI/pythia-1.4b -EleutherAI/pythia-2.8b -EleutherAI/pythia-6.9b -EleutherAI/pythia-12b -EleutherAI/pythia-70m-deduped -EleutherAI/pythia-160m-deduped -EleutherAI/pythia-410m-deduped -EleutherAI/pythia-1b-deduped -EleutherAI/pythia-1.4b-deduped -EleutherAI/pythia-2.8b-deduped -EleutherAI/pythia-6.9b-deduped -EleutherAI/pythia-12b-deduped -``` - -In order to use a specific Pythia checkpoint, for instance [pythia-1b](https://huggingface.co/EleutherAI/pythia-1b), download the weights and convert the checkpoint to the litgpt format: - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id EleutherAI/pythia-1b -``` - -By default, the checkpoint conversion step will use the data type of the HF checkpoint's parameters. In cases where RAM -or disk size is constrained, it might be useful to pass `--dtype bfloat16` to convert all parameters into this smaller precision before continuing. - -You're done! To execute the model just run: - -```bash -pip install tokenizers - -python litgpt/generate/base.py --prompt "Hello, my name is" --checkpoint_dir checkpoints/EleutherAI/pythia-1b -``` diff --git a/tutorials/download_redpajama_incite.md b/tutorials/download_redpajama_incite.md deleted file mode 100644 index 0296f09ed9..0000000000 --- a/tutorials/download_redpajama_incite.md +++ /dev/null @@ -1,44 +0,0 @@ -## Download [RedPajama-INCITE](https://www.together.xyz/blog/redpajama-models-v1) weights - -Togethercomputer's RedPajama-INCITE family of models were trained over the [RedPajama v1](https://www.together.xyz/blog/redpajama) dataset, with the same architecture as the popular [Pythia](download_pythia.md) model suite. Weights are released under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0). - -The release includes a base model, a chat fine-tuned model, and an instruction tuned model of sizes 3B and 7B. - -To see all the available checkpoints for RedPajama-INCITE, run: - -```bash -python litgpt/scripts/download.py | grep RedPajama -``` - -which will print - -```text -togethercomputer/RedPajama-INCITE-Base-3B-v1 -togethercomputer/RedPajama-INCITE-Chat-3B-v1 -togethercomputer/RedPajama-INCITE-Instruct-3B-v1 -togethercomputer/RedPajama-INCITE-7B-Base -togethercomputer/RedPajama-INCITE-7B-Chat -togethercomputer/RedPajama-INCITE-7B-Instruct -togethercomputer/RedPajama-INCITE-Base-7B-v0.1 -togethercomputer/RedPajama-INCITE-Chat-7B-v0.1 -togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1 -``` - -In order to use a specific RedPajama-INCITE checkpoint, for instance [RedPajama-INCITE-Base-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1), download the weights and convert the checkpoint to the litgpt format: - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id togethercomputer/RedPajama-INCITE-Base-3B-v1 -``` - -By default, the checkpoint conversion step will use the data type of the HF checkpoint's parameters. In cases where RAM -or disk size is constrained, it might be useful to pass `--dtype bfloat16` to convert all parameters into this smaller precision before continuing. - -You're done! To execute the model just run: - -```bash -pip install tokenizers - -python litgpt/generate/base.py --prompt "Hello, my name is" --checkpoint_dir checkpoints/togethercomputer/RedPajama-INCITE-Base-3B-v1 -``` diff --git a/tutorials/download_stablecode.md b/tutorials/download_stablecode.md deleted file mode 100644 index 004e9c530a..0000000000 --- a/tutorials/download_stablecode.md +++ /dev/null @@ -1,50 +0,0 @@ -## Download [StableCode](https://huggingface.co/collections/stabilityai/stable-code-64f9dfb4ebc8a1be0a3f7650) weights - -StableCode is a suite of 4 developer assistant models. - -Each one of them is a decoder-only code completion model with 3 billion parameters, pre-trained on a diverse collection of programming languages that ranked highest in the 2023 StackOverflow developer survey. - -For more info on the models, please visit the [StableCode repository](https://huggingface.co/collections/stabilityai/stable-code-64f9dfb4ebc8a1be0a3f7650). - ------- - -To see all the available checkpoints for StableCode, run: - -```bash -python litgpt/scripts/download.py | grep -E "stable-?code" -``` - -which will print: - -```text -stabilityai/stablecode-completion-alpha-3b -stabilityai/stablecode-completion-alpha-3b-4k -stabilityai/stablecode-instruct-alpha-3b -stabilityai/stable-code-3b -``` - -In order to use a specific StableCode checkpoint, for instance [stable-code-3b](https://huggingface.co/stabilityai/stable-code-3b), download the weights and convert the checkpoint to the LitGPT format: - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -export repo_id=stabilityai/stable-code-3b -python litgpt/scripts/download.py --repo_id $repo_id -``` - -By default, the checkpoint conversion step will use the data type of the HF checkpoint's parameters. In cases where RAM -or disk size is constrained, it might be useful to pass `--dtype bfloat16` to convert all parameters into this smaller precision before continuing. - -You're done! To execute the model just run: - -```bash -pip install tokenizers - -python litgpt/generate/base.py --prompt "Write in Python a softmax function. Be concise." --checkpoint_dir checkpoints/$repo_id -``` - -Or you can run the model in an interactive mode: - -```bash -python litgpt/chat/base.py --checkpoint_dir checkpoints/$repo_id -``` diff --git a/tutorials/download_stablelm.md b/tutorials/download_stablelm.md deleted file mode 100644 index 3e87c603d6..0000000000 --- a/tutorials/download_stablelm.md +++ /dev/null @@ -1,77 +0,0 @@ -## Download [StableLM](https://github.com/Stability-AI/StableLM) weights - -StableLM is a family of generative language models trained by StabilityAI. - -To see all the available checkpoints for StableLM, run: - -```bash -python litgpt/scripts/download.py | grep stablelm -``` - -which will print: - -```text -stabilityai/stablelm-base-alpha-3b -stabilityai/stablelm-base-alpha-7b -stabilityai/stablelm-tuned-alpha-3b -stabilityai/stablelm-tuned-alpha-7b -stabilityai/stablelm-3b-4e1t -stabilityai/stablelm-zephyr-3b -``` - -In order to use a specific StableLM checkpoint, for instance [stablelm-base-alpha-3b](http://huggingface.co/stabilityai/stablelm-base-alpha-3b), download the weights and convert the checkpoint to the litgpt format: - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id stabilityai/stablelm-base-alpha-3b -``` - -By default, the checkpoint conversion step will use the data type of the HF checkpoint's parameters. In cases where RAM -or disk size is constrained, it might be useful to pass `--dtype bfloat16` to convert all parameters into this smaller precision before continuing. - -You're done! To execute the model just run: - - -> [!Important] -> `stablelm-base-alpha-(3,7)b` and `stablelm-tuned-alpha-(3,7)b` are deprecated and are no longer in the StableLM collection. Last time these models were updated in April 2023. Consider using `stablelm-3b-4e1t` (base model) or `stablelm-zephyr-3b` (instruct fine-tuned). - -### StableLM-3B-4E1T - -StableLM-3B-4E1T is a 3 billion (3B) parameter language model pre-trained under the multi-epoch regime to study the impact of repeated tokens on downstream performance. - -Building on past achievements, StabilityAI underwent training on 1 trillion tokens for 4 epochs, as recommended by Muennighoff et al. (2023) in their study "Scaling Data-Constrained Language Models." They noted that training with repeated data over 4 epochs has minimal impact on loss compared to using unique data. Additionally, insights from "Go smol or go home" (De Vries, 2023) guided the choice of token count. The research suggests that a 2.96B model trained on 2.85 trillion tokens can achieve a loss similar to a compute-optimized 9.87B language model. -More info can be found on [GitHub](https://github.com/Stability-AI/StableLM?tab=readme-ov-file#stablelm-3b-4e1t). - -### StableLM Zephyr 3B - -Lightweight LLM, preference tuned for instruction following and Q&A-type tasks. This model is an extension of the pre-existing StableLM 3B-4e1t model and is inspired by the Zephyr 7B model from HuggingFace. With StableLM Zephyr's 3 billion parameters, this model efficiently caters to a wide range of text generation needs, from simple queries to complex instructional contexts on edge devices. -More details can be found in the [announcement](https://stability.ai/news/stablelm-zephyr-3b-stability-llm). - -### Usage - -In order to use a specific StableLM checkpoint, for instance [StableLM Zephyr 3B](https://huggingface.co/stabilityai/stablelm-zephyr-3b), download the weights and convert the checkpoint to the LitGPT format: - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -export repo_id=stabilityai/stablelm-zephyr-3b -python litgpt/scripts/download.py --repo_id $repo_id -``` - -By default, the `convert_hf_checkpoint` step will use the data type of the HF checkpoint's parameters. In cases where RAM -or disk size is constrained, it might be useful to pass `--dtype bfloat16` to convert all parameters into this smaller precision before continuing. - -You're done! To execute the model just run: - -```bash -pip install tokenizers - -python litgpt/generate/base.py --prompt "Hello, my name is" --checkpoint_dir checkpoints/$repo_id -``` - -Or you can run the model in an interactive mode: - -```bash -python litgpt/chat/base.py --checkpoint_dir checkpoints/$repo_id -``` diff --git a/tutorials/download_tinyllama.md b/tutorials/download_tinyllama.md deleted file mode 100644 index 1a9e25b050..0000000000 --- a/tutorials/download_tinyllama.md +++ /dev/null @@ -1,65 +0,0 @@ -## Download TinyLlama weights - -[TinyLlama 1.1B](https://github.com/jzhang38/TinyLlama/) is Apache 2.0 licensed and can be used without restrictions. -It is still in development and at the time of writing this, checkpoints for the model trained up to 2T tokens are available. -The target is to train it for ~3 epochs on 3T tokens total. For more details on the schedule and progress of the pretraining, see the official [README](https://github.com/jzhang38/TinyLlama/tree/main). - -There are two version of TinyLlama available: a base one and a fine-tuned "Chat" version. -To see all available versions, run: - -```bash -python litgpt/scripts/download.py | grep TinyLlama -``` - -which will print - -```text -TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T -TinyLlama/TinyLlama-1.1B-Chat-v1.0 -``` - -In order to use a specific checkpoint, for instance [TinyLlama 1.1B base model](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T), which requires about 5 GB of disk space, download the weights and convert the checkpoint to the litgpt format: - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T -``` - ------ - -With the `Chat` version of the model, the download and conversion procedures are slightly different. -As this version of the model is stored in `safetensor` format, to download it an additional flag is required: - -```bash -python litgpt/scripts/download.py --repo_id TinyLlama/TinyLlama-1.1B-Chat-v1.0 -``` - -The model is shipped in `bfloat16` format, so if your hardware doesn't support it, you can provide `--dtype` argument during model conversion. For example we can convert the weights into `float32` format: - -```bash -python litgpt/scripts/download.py \ - --repo_id checkpoints/TinyLlama/TinyLlama-1.1B-Chat-v1.0 --dtype=float32 -``` - ------ - -You're done! To execute the model just run: - -```bash -pip install sentencepiece - -# base version -python litgpt/chat/base.py --checkpoint_dir checkpoints/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T - -# or - -# chat version -python litgpt/chat/base.py --checkpoint_dir checkpoints/TinyLlama/TinyLlama-1.1B-Chat-v1.0 -``` - -To improve the response from Chat version you can also provide these args (as in the [model card](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)): - -```bash -python litgpt/chat/base.py --checkpoint_dir checkpoints/TinyLlama/TinyLlama-1.1B-Chat-v1.0 --top_k=50 --temperature=0.7 -``` diff --git a/tutorials/download_vicuna.md b/tutorials/download_vicuna.md deleted file mode 100644 index e75c3d37a1..0000000000 --- a/tutorials/download_vicuna.md +++ /dev/null @@ -1,40 +0,0 @@ -## Download [Vicuna](https://lmsys.org/blog/2023-03-30-vicuna/) weights - -Vicuna is an open-source family of chatbots trained by fine-tuning LLaMA on user-shared conversations collected from [ShareGPT](https://sharegpt.com). - -To see all the available checkpoints for Vicuna, run: - -```bash -python litgpt/scripts/download.py | grep vicuna -``` - -which will print - -```text -lmsys/vicuna-7b-v1.3 -lmsys/vicuna-13b-v1.3 -lmsys/vicuna-33b-v1.3 -lmsys/vicuna-7b-v1.5 -lmsys/vicuna-7b-v1.5-16k -lmsys/vicuna-13b-v1.5 -lmsys/vicuna-13b-v1.5-16k -``` - -In order to use a specific Vicuna checkpoint, for instance [vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5), download the weights and convert the checkpoint to the litgpt format: - -```bash -pip install 'huggingface_hub[hf_transfer] @ git+https://github.com/huggingface/huggingface_hub' - -python litgpt/scripts/download.py --repo_id lmsys/vicuna-7b-v1.5 -``` - -By default, the checkpoint conversion step will use the data type of the HF checkpoint's parameters. In cases where RAM -or disk size is constrained, it might be useful to pass `--dtype bfloat16` to convert all parameters into this smaller precision before continuing. - -You're done! To execute the model just run: - -```bash -pip install sentencepiece - -python litgpt/chat/base.py --checkpoint_dir checkpoints/lmsys/vicuna-7b-v1.5 -``` diff --git a/tutorials/finetune_adapter.md b/tutorials/finetune_adapter.md index b73b4f0606..53910bfad4 100644 --- a/tutorials/finetune_adapter.md +++ b/tutorials/finetune_adapter.md @@ -13,7 +13,7 @@ LLaMA-Adapter v2 extends the original LLaMA-Adapter idea by adding trainable bia The steps here only need to be done once: 1. Follow the instructions in the [README](../README.md) to install the dependencies. -2. Download and convert the weights following our [guide](download_stablelm.md). +2. Download and convert the weights following our [guide](download_model_weights.md). LitGPT provides common datasets for finetuning, such as Alpaca, LIMA, Dolly, and more. You can optionally [prepare your own dataset](#tune-on-your-dataset). diff --git a/tutorials/finetune_full.md b/tutorials/finetune_full.md index aeb373102f..ee813539e6 100644 --- a/tutorials/finetune_full.md +++ b/tutorials/finetune_full.md @@ -7,7 +7,7 @@ If you are interested in parameter-efficient finetuning, check out [finetune_ada The steps here only need to be done once: 1. Follow the instructions in the [README](../README.md) to install the dependencies. -2. Download and convert the weights following our [guide](download_stablelm.md). +2. Download and convert the weights following our [guide](download_model_weights.md). LitGPT provides common datasets for finetuning, such as Alpaca, LIMA, Dolly, and more. You can optionally [prepare your own dataset](#tune-on-your-dataset). diff --git a/tutorials/finetune_lora.md b/tutorials/finetune_lora.md index 3529d2ba19..bd340577f5 100644 --- a/tutorials/finetune_lora.md +++ b/tutorials/finetune_lora.md @@ -11,12 +11,7 @@ The steps here only need to be done once: 1. Follow the instructions in the [README](../README.md) to install the dependencies. 2. Download and convert the weights and save them in the `./checkpoints` folder. - Weights can be downloaded following these instructions: - -- [StableLM](download_stablelm.md) -- [Pythia](download_pythia.md) -- [Redpajama-INCITE](download_redpajama_incite.md) -- [Falcon](download_falcon.md) + Weights can be downloaded following the instructions in the [download_model_weights](download_model_weights.md) documentation: LitGPT provides common datasets for finetuning, such as Alpaca, LIMA, Dolly, and more. You can optionally [prepare your own dataset](#tune-on-your-dataset). @@ -140,7 +135,7 @@ python litgpt/finetune/lora.py \ This code will produce a `lit_model.pth.lora` file in the specified output directory, containing only the LoRA weights. To merge these LoRA weights with the original model checkpoint, you can use the `merge_lora.py` script as follows: ```bash -python scripts/merge_lora.py \ +python litgpt/scripts/merge_lora.py \ --checkpoint_dir "out/lora/stablelm-base-alpha-3b/final" ``` diff --git a/tutorials/inference.md b/tutorials/inference.md index 4c968a4647..c52c425f41 100644 --- a/tutorials/inference.md +++ b/tutorials/inference.md @@ -12,7 +12,7 @@ Output: Hello, my name is Levi Durrer, I'm an Austrian journalist - Chairman of the Press Blair Party, with 37 years in the Press Blair International, and two years in the Spectre of Austerity for the other. I'm crossing my fingers that you will feel ``` -The script assumes you have downloaded and converted the weights as described [here](download_stablelm.md). +The script assumes you have downloaded and converted the weights as described [here](download_model_weights.md). This will run the 3B pre-trained model and require ~7 GB of GPU memory using the `bfloat16` datatype. diff --git a/tutorials/prepare_dataset.md b/tutorials/prepare_dataset.md index 96049ff5a3..c4f9efa571 100644 --- a/tutorials/prepare_dataset.md +++ b/tutorials/prepare_dataset.md @@ -24,7 +24,7 @@ Below is a table of all datasets that are currently supported in LitGPT: The steps here only need to be done once before preparing the finetuning datasets in the following subsections: 1. Follow the instructions in the [README](../README.md) to install the dependencies. -2. Download and convert the weights following our [guide](download_falcon.md). +2. Download and convert the weights following our [guide](download_model_weights.md). For the following examples, we will focus on finetuning with the `litgpt/finetune/lora.py` script and use a Falcon 7B model. However, the same steps apply to all other models and finetuning scripts. diff --git a/xla/README.md b/xla/README.md index e02c92e9f9..d71a0e0f2c 100644 --- a/xla/README.md +++ b/xla/README.md @@ -78,7 +78,7 @@ export PJRT_DEVICE=TPU > An extensive guide on setup and available options can be found [here](https://cloud.google.com/tpu/docs/v4-users-guide). Since a new machine was created, you may need to download pretrained weights. -They can be copied to the machine using `gcloud compute tpus tpu-vm scp`, or you can follow the steps described in our [downloading guide](download_stablelm.md). +They can be copied to the machine using `gcloud compute tpus tpu-vm scp`, or you can follow the steps described in our [downloading guide](download_model_weights.md). It is also recommended to set up a persistent disk from which to load checkpoints. Follow [this guide](https://cloud.google.com/tpu/docs/setup-persistent-disk#setting_up_a_tpu_vm_and_a_persistent_disk) to do so.