Skip to content

Commit

Permalink
add Llama-3.3-70B-Instruct
Browse files Browse the repository at this point in the history
  • Loading branch information
ysjprojects committed Dec 7, 2024
1 parent d3345b6 commit 881ffd8
Show file tree
Hide file tree
Showing 4 changed files with 27 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ Every model is written from scratch to maximize performance and remove layers of
| Llama 2 | 7B, 13B, 70B | Meta AI | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288) |
| Llama 3.1 | 8B, 70B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3) |
| Llama 3.2 | 1B, 3B | Meta AI | [Meta AI 2024](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/) |
| Llama 3.3 | 70B | Meta AI | [Meta AI 2024](https://ai.meta.com/blog/llama-3-3-large-language-model-family/) |
| Mathstral | 7B | Mistral AI | [Mistral AI 2024](https://mistral.ai/news/mathstral/) |
| MicroLlama | 300M | Ken Wang | [MicroLlama repo](https://github.com/keeeeenw/MicroLlama) |
| Mixtral MoE | 8x7B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/mixtral-of-experts/) |
Expand Down
23 changes: 23 additions & 0 deletions litgpt/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -700,8 +700,31 @@ def norm_class(self) -> Type:
rope_base=500000,
rope_adjustments=dict(factor=32.0, low_freq_factor=1.0, high_freq_factor=4.0, original_max_seq_len=8192)
),
# https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/config.json
dict(
name="Llama-3.3-70B-Instruct",
hf_config=dict(org="meta-llama", name="Llama-3.3-70B-Instruct"),
block_size=131072,
vocab_size=128000,
padded_vocab_size=128256,
n_layer=80,
n_head=64,
n_embd=8192,
n_query_groups=8,
rotary_percentage=1.0,
parallel_residual=False,
bias=False,
norm_class_name="RMSNorm",
mlp_class_name="LLaMAMLP",
intermediate_size=28672,
rope_base=500000,
rope_adjustments=dict(factor=8.0, low_freq_factor=1.0, high_freq_factor=4.0, original_max_seq_len=8192)
),
]
for c in llama_3:
if c["name"] == "Llama-3.3-70B-Instruct":
configs.append(c)
continue
for kind in ("", "-Instruct"):
copy = deepcopy(c)
copy["name"] = c["name"].format(kind)
Expand Down
1 change: 1 addition & 0 deletions tests/test_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,7 @@ def test_against_original_open_llama_3b(device, dtype):
{"name": "Llama-3.1-8B-Instruct"},
{"name": "Llama-3.2-1B"},
{"name": "Llama-3.2-3B"},
{"name": "Llama-3.3-70B-Instruct"},
],
)
@pytest.mark.parametrize(
Expand Down
2 changes: 2 additions & 0 deletions tutorials/download_model_weights.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ LitGPT supports a variety of LLM architectures with publicly available weights.
| Llama 3 | 8B, 70B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3) |
| Llama 3.1 | 8B, 70B, 405B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3) |
| Llama 3.2 | 1B, 3B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md) |
| Llama 3.3 | 70B | Meta AI | [Meta AI 2024](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) |
| Llama 3.1 Nemotron | 70B | NVIDIA | [NVIDIA AI 2024](https://build.nvidia.com/nvidia/llama-3_1-nemotron-70b-instruct/modelcard) |
| LongChat | 7B, 13B | LMSYS | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/) |
| Mathstral | 7B | Mistral AI | [Mistral AI 2024](https://mistral.ai/news/mathstral/) |
Expand Down Expand Up @@ -134,6 +135,7 @@ meta-llama/Llama-3.2-1B
meta-llama/Llama-3.2-1B-Instruct
meta-llama/Llama-3.2-3B
meta-llama/Llama-3.2-3B-Instruct
meta-llama/Llama-3.3-70B-Instruct
meta-llama/Meta-Llama-3-70B
meta-llama/Meta-Llama-3-70B-Instruct
meta-llama/Meta-Llama-3-8B
Expand Down

0 comments on commit 881ffd8

Please sign in to comment.