update

Lightning-AI · Apr 19, 2024 · 80e97b0 · 80e97b0
1 parent 6aa55e4
commit 80e97b0
Showing 1 changed file with 24 additions and 23 deletions.
diff --git a/tutorials/download_model_weights.md b/tutorials/download_model_weights.md
@@ -3,29 +3,30 @@
 LitGPT supports a variety of LLM architectures with publicly available weights. You can download model weights and access a list of supported models using the LitGPT `download.py` script.
 
 
-| Model                                        | Model size                               | Reference                                                                                                                    |
-|----------------------------------------------|------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
-| CodeGemma by Google                          | 7B                                       | [Google Team, Google Deepmind](https://ai.google.dev/gemma/docs/codegemma)                                                                      |
-| Code Llama by Meta AI                        | 7B, 13B, 34B, 70B                        | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950)                                                                      |
-| Dolly by Databricks                          | 3B, 7B, 12B                              | [Conover et al. 2023](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm) |
-| Falcon by TII UAE                            | 7B, 40B, 180B                            | [TII 2023](https://falconllm.tii.ae)                                                                                         |
-| FreeWilly2 (Stable Beluga 2) by Stability AI | 70B                                      | [Stability AI 2023](https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models)                             |
-| Function Calling Llama 2 by Trelis           | 7B                                       | [Trelis et al. 2023](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2)                                   |
-| Gemma by Google                              | 2B, 7B                                   | [Google Team, Google Deepmind](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf)                         |
-| Llama 2 by Meta AI                           | 7B, 13B, 70B                             | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288)                                                                      |
-| LongChat by LMSYS                            | 7B, 13B                                  | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/)                                                            |
-| Mistral and Mixtral by Mistral AI            | 7B                                       | [Mistral website](https://mistral.ai/)                                                                                       |
-| Nous-Hermes by NousResearch                  | 7B, 13B, 70B                             | [Org page](https://huggingface.co/NousResearch)                                                                              |
-| OpenLLaMA by OpenLM Research                 | 3B, 7B, 13B                              | [Geng & Liu 2023](https://github.com/openlm-research/open_llama)                                                             |
-| Phi by Microsoft Research                    | 1.3B, 2.7B                               | [Li et al. 2023](https://arxiv.org/abs/2309.05463)                                                                           |
-| Platypus by Lee at el.                       | 7B, 13B, 70B                             | [Lee, Hunter, and Ruiz 2023](https://arxiv.org/abs/2308.07317)                                                               |
-| Pythia by EleutherAI                         | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373)                                                                     |
-| RedPajama-INCITE by Together                 | 3B, 7B                                   | [Together 2023](https://together.ai/blog/redpajama-models-v1)                                                                |
-| StableCode by Stability AI                   | 3B                                       | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)                                           |
-| StableLM by Stability AI                     | 3B, 7B                                   | [Stability AI 2023](https://github.com/Stability-AI/StableLM)                                                                |
-| StableLM Zephyr by Stability AI              | 3B                                       | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)                                           |
-| TinyLlama by Zhang et al.                    | 1.1B                                     | [Zhang et al. 2023](https://github.com/jzhang38/TinyLlama)                                                                   |
-| Vicuna by LMSYS                              | 7B, 13B, 33B                             | [Li et al. 2023](https://lmsys.org/blog/2023-03-30-vicuna/)                                                                  |
+| Model                                        | Model size                              | Reference                                                                                                                |
+|----------------------------------------------|-----------------------------------------|--------------------------------------------------------------------------------------------------------------------------|
+| CodeGemma by Google                          | 7B                                      | [Google Team, Google Deepmind](https://ai.google.dev/gemma/docs/codegemma)                                                             |
+| Code Llama by Meta AI                        | 7B, 13B, 34B, 70B                       | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950)                                                                  |
+| Dolly by Databricks                          | 3B, 7B, 12B                             | [Conover et al. 2023](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm) |
+| Falcon by TII UAE                            | 7B, 40B, 180B                           | [TII 2023](https://falconllm.tii.ae)                                                                                     |
+| FreeWilly2 (Stable Beluga 2) by Stability AI | 70B                                     | [Stability AI 2023](https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models)                         |
+| Function Calling Llama 2 by Trelis           | 7B                                      | [Trelis et al. 2023](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2)                               |
+| Gemma by Google                              | 2B, 7B                                  | [Google Team, Google Deepmind](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf)                     |
+| Llama 2 by Meta AI                           | 7B, 13B, 70B                            | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288)                                                                  |
+| Llama 3 by Meta AI                           | 8B, 70B                                 | [Meta AI 2024](https://github.com/meta-llama/llama3)                                                                     |
+| LongChat by LMSYS                            | 7B, 13B                                 | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/)                                                        |
+| Mistral and Mixtral by Mistral AI            | 7B                                      | [Mistral website](https://mistral.ai/)                                                                                   |
+| Nous-Hermes by NousResearch                  | 7B, 13B, 70B                            | [Org page](https://huggingface.co/NousResearch)                                                                          |
+| OpenLLaMA by OpenLM Research                 | 3B, 7B, 13B                             | [Geng & Liu 2023](https://github.com/openlm-research/open_llama)                                                         |
+| Phi by Microsoft Research                    | 1.3B, 2.7B                              | [Li et al. 2023](https://arxiv.org/abs/2309.05463)                                                                       |
+| Platypus by Lee at el.                       | 7B, 13B, 70B                            | [Lee, Hunter, and Ruiz 2023](https://arxiv.org/abs/2308.07317)                                                           |
+| Pythia by EleutherAI                         | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373)                                                                 |
+| RedPajama-INCITE by Together                 | 3B, 7B                                  | [Together 2023](https://together.ai/blog/redpajama-models-v1)                                                            |
+| StableCode by Stability AI                   | 3B                                      | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)                                       |
+| StableLM by Stability AI                     | 3B, 7B                                  | [Stability AI 2023](https://github.com/Stability-AI/StableLM)                                                            |
+| StableLM Zephyr by Stability AI              | 3B                                      | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)                                       |
+| TinyLlama by Zhang et al.                    | 1.1B                                    | [Zhang et al. 2023](https://github.com/jzhang38/TinyLlama)                                                               |
+| Vicuna by LMSYS                              | 7B, 13B, 33B                            | [Li et al. 2023](https://lmsys.org/blog/2023-03-30-vicuna/)                                                              |