diff --git a/README.md b/README.md
index f6e426b9fa..4bd3d8bf75 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,5 @@
 <div align="center">
-<img src="https://pl-public-data.s3.amazonaws.com/assets_lightning/LitStableLM_Badge.png" alt="LitGPT" width="128"/>
 
-&nbsp;
 
 # ⚡ LitGPT
 
@@ -20,11 +18,11 @@ Uses the latest state-of-the-art techniques:
 <p align="center">
   <a href="https://lightning.ai/">Lightning AI</a> •
   <a href="#choose-from-20-llms">Models</a> •
-  <a href="#install-litgpt">Install</a> •
-  <a href="#get-started">Get started</a> •
-  <a href="#use-an-llm">Evaluate</a> •
+  <a href="#quick-start">Quick start</a> •
+  <a href="#use-an-llm-for-inference">Inference</a> •
   <a href="#finetune-an-llm">Finetune</a> •
   <a href="#finetune-an-llm">Pretrain</a> •
+    <a href="#deploy-an-llm">Deploy</a> •
   <a href="#state-of-the-art-features">Features</a> •
   <a href="#training-recipes">Training recipes (YAML)</a>
 </p>
@@ -34,13 +32,13 @@ Uses the latest state-of-the-art techniques:
 &nbsp;
 
 # Finetune, pretrain and deploy LLMs Lightning fast ⚡⚡   
-LitGPT is a command-line tool designed to easily [finetune](#finetune-an-llm), [pretrain](#pretrain-an-llm), [evaluate](#use-an-llm), and deploy [20+ LLMs](#choose-from-20-llms) **on your own data**. It features highly-optimized [training recipes](#training-recipes) for the world's most powerful open-source large-language-models (LLMs).
+LitGPT is a command-line tool designed to easily [finetune](#finetune-an-llm), [pretrain](#pretrain-an-llm), [evaluate](#use-an-llm), and [deploy](#deploy-an-llm) [20+ LLMs](#choose-from-20-llms) **on your own data**. It features highly-optimized [training recipes](#training-recipes) for the world's most powerful open-source large language models (LLMs).
 
 We reimplemented all model architectures and training recipes from scratch for 4 reasons:   
 
 1. Remove all abstraction layers and have single file implementations.   
 2. Guarantee Apache 2.0 compliance to enable enterprise use without limits.    
-3. Optimized each model architectural detail to maximize performance, reduce costs, and speed up training.    
+3. Optimized each model's architectural detail to maximize performance, reduce costs, and speed up training.    
 4. Highly-optimized [recipe configs](#training-recipes) we have tested at enterprise scale.               
 
 ---
@@ -50,6 +48,7 @@ LitGPT has 🤯 **custom, from-scratch implementations** of [20+ LLMs](tutorials
 
 | Model | Model size | Author | Reference |
 |----|----|----|----|
+| Llama 3 | 8B, 70B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3)                                                                     |
 | Llama 2 | 7B, 13B, 70B | Meta AI | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288)                                                                      |
 | Code Llama | 7B, 13B, 34B, 70B | Meta AI | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950) |
 | Mistral | 7B | Mistral AI | [Mistral website](https://mistral.ai/)                                                                                       |
@@ -72,6 +71,7 @@ LitGPT has 🤯 **custom, from-scratch implementations** of [20+ LLMs](tutorials
 | Function Calling Llama 2 | 7B | Trelis | [Trelis et al. 2023](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2)                                   |
 | Gemma | 2B, 7B | Google | [Google Team, Google Deepmind](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf)                         |
 | Llama 2 | 7B, 13B, 70B | Meta AI | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288)                                                                      |
+| Llama 3 | 8B, 70B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3)                                                                     |
 | LongChat | 7B, 13B | LMSYS | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/)                                                            |
 | Mistral | 7B | Mistral AI | [Mistral website](https://mistral.ai/)                                                                                       |
 | Nous-Hermes | 7B, 13B, 70B | NousResearch | [Org page](https://huggingface.co/NousResearch)                                                                              |
@@ -114,21 +114,22 @@ pip install -e '.[all]'
 
 ---
 
-# Get started
+# Quick start
 After installing LitGPT, select the model and action you want to take on that model (finetune, pretrain, evaluate, deploy, etc...):    
 
 ```bash
 # ligpt [action] [model]
-litgpt  download  mistralai/Mistral-7B-Instruct-v0.2
-litgpt  chat      mistralai/Mistral-7B-Instruct-v0.2
-litgpt  finetune  mistralai/Mistral-7B-Instruct-v0.2    
-litgpt  pretrain  mistralai/Mistral-7B-Instruct-v0.2    
-litgpt  serve     mistralai/Mistral-7B-Instruct-v0.2    
+litgpt  download  meta-llama/Meta-Llama-3-8B-Instruct
+litgpt  chat      meta-llama/Meta-Llama-3-8B-Instruct
+litgpt  finetune  meta-llama/Meta-Llama-3-8B-Instruct    
+litgpt  pretrain  meta-llama/Meta-Llama-3-8B-Instruct    
+litgpt  serve     meta-llama/Meta-Llama-3-8B-Instruct    
 ```
 
 &nbsp;
 
-###  Use an LLM
+###  Use an LLM for inference
+Use LLMs for inference to test its chatting capabilities, run evaluations, or extract embeddings, etc...     
 Here's an example showing how to use the Mistral 7B LLM.
 
 <a target="_blank" href="https://lightning.ai/lightning-ai/studios/litgpt-chat">
@@ -155,14 +156,20 @@ For more information, refer to the [download](tutorials/download_model_weights.m
 ### Finetune an LLM
 [Finetune](tutorials/finetune.md) a model to specialize it on your own custom dataset:
 
+<a target="_blank" href="https://lightning.ai/lightning-ai/studios/litgpt-finetune">
+  <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio"/>
+</a>
+
+&nbsp; 
+
 ```bash
 # 1) Download a pretrained model
 litgpt download --repo_id microsoft/phi-2
 
 # 2) Finetune the model
-curl -L https://huggingface.co/datasets/medalpaca/medical_meadow_health_advice/raw/main/medical_meadow_health_advice.json -o my_custom_dataset.json
+curl -L https://huggingface.co/datasets/ksaw008/finance_alpaca/resolve/main/finance_alpaca.json -o my_custom_dataset.json
 
-litgpt finetune lora \
+litgpt finetune \
   --checkpoint_dir checkpoints/microsoft/phi-2 \
   --data JSON \
   --data.json_path my_custom_dataset.json \
@@ -174,9 +181,17 @@ litgpt chat \
   --checkpoint_dir out/phi-2-lora/final
 ```
 
+&nbsp;
+
 ### Pretrain an LLM   
 Train an LLM from scratch on your own data via pretraining:
 
+<a target="_blank" href="https://lightning.ai/lightning-ai/studios/litgpt-pretrain">
+<img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg"; alt="Open In Studio"/>
+</a>
+
+&nbsp; 
+
 ```bash
 mkdir -p custom_texts
 curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_texts/book1.txt
@@ -201,10 +216,19 @@ litgpt chat \
   --checkpoint_dir out/custom-model/final
 ```
 
+&nbsp;
+
 ### Continue pretraining an LLM       
-This is another way of finetuning that specialize an already pretrained model by training on custom data:    
+This is another way of finetuning that specializes an already pretrained model by training on custom data:    
 
-```
+
+<a target="_blank" href="https://lightning.ai/lightning-ai/studios/litgpt-continue-pretraining">
+<img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg"; alt="Open In Studio"/>
+</a>
+
+&nbsp; 
+
+```bash
 mkdir -p custom_texts
 curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_texts/book1.txt
 curl https://www.gutenberg.org/cache/epub/26393/pg26393.txt --output custom_texts/book2.txt
@@ -215,6 +239,7 @@ litgpt download --repo_id EleutherAI/pythia-160m
 # 2) Continue pretraining the model
 litgpt pretrain \
   --model_name pythia-160m \
+  --tokenizer_dir checkpoints/EleutherAI/pythia-160m \
   --initial_checkpoint_dir checkpoints/EleutherAI/pythia-160m \
   --data TextFiles \
   --data.train_data_path "custom_texts/" \
@@ -228,6 +253,37 @@ litgpt chat \
 
 &nbsp;
 
+### Deploy an LLM
+Once you're ready to deploy a finetuned LLM, run this command:   
+
+<a target="_blank" href="https://lightning.ai/lightning-ai/studios/litgpt-serve">
+  <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio"/>
+</a>
+
+&nbsp;
+
+```bash
+# locate the checkpoint to your finetuned or pretrained model and call the `serve` command:  
+litgpt serve --checkpoint_dir path/to/your/checkpoint/microsoft/phi-2
+
+# Alternative: if you haven't finetuned, download any checkpoint to deploy it:     
+litgpt download --repo_id microsoft/phi-2
+litgpt serve --checkpoint_dir checkpoints/microsoft/phi-2
+```
+
+Test the server in a separate terminal and integrate the model API into your AI product:    
+```python
+# 3) Use the server (in a separate session)
+import requests, json
+ response = requests.post(
+     "http://127.0.0.1:8000/predict", 
+     json={"prompt": "Fix typos in the following sentence: Exampel input"}
+)
+print(response.json()["output"])
+```
+
+&nbsp;
+
 > [!NOTE]
 > **[Read the full docs](tutorials/0_to_litgpt.md)**.
 
@@ -267,7 +323,7 @@ Browse all training recipes [here](config_hub).
 ### Example
 
 ```bash
-litgpt finetune lora \
+litgpt finetune \
   --config https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/finetune/llama-2-7b/lora.yaml
 ```
 
@@ -422,7 +478,7 @@ seed: 1337
 Override any parameter in the CLI:
 
 ```bash
-litgpt finetune lora \
+litgpt finetune \
   --config https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/finetune/llama-2-7b/lora.yaml \
   --lora_r 4
 ```
diff --git a/config_hub/finetune/README.md b/config_hub/finetune/README.md
index fc82e0854b..2aaa2d3b79 100644
--- a/config_hub/finetune/README.md
+++ b/config_hub/finetune/README.md
@@ -22,6 +22,10 @@ For more information, see the [Dealing with out-of-memory (OOM) errors](../../tu
 | llama-2-7b/qlora.yaml             | 7B   | Alpaca 2k | 4      | 0.814    | 13.68 GB    | 512            | 2                | bfloat16  | 45.68 min (A10G)   |
 | llama-2-7b/full.yaml              | 7B   | Alpaca 2k | 1      | 0.941    | 26.81 GB    | 512            | 4                | bfloat16  | 1.78 min (4xA100)  |
 |                                   |      |           |        |          |             |                |                  |           |                    |
+| llama-3-8b/lora.yaml              | 8B   | Alpaca 2k | 2      | 0.890    | 19.73 GB    | 512            | 1                | bfloat16  | 14.80 min (A10G)   |
+| llama-3-8b/qlora.yaml             | 8B   | Alpaca 2k | 2      | 0.941    | 17.41 GB    | 512            | 2                | bfloat16  | 22.34 min (A10G)   |
+| llama-3-8b/full.yaml              | 8B   | Alpaca 2k | 1      | 1.451    | 35.48 GB    | 512            | 4                | bfloat16  | 2.14 min (4xA100)  |
+|                                   |      |           |        |          |             |                |                  |           |                    |
 | mistral-7b/lora.yaml  (v0.1)      | 7B   | Alpaca 2k | 4      | 0.796    | 20.65 GB    | 512            | 2                | bfloat16  | 31.04 min (1xA10G) |
 | mistral-7b/qlora.yaml (v0.1)      | 7B   | Alpaca 2k | 4      | 0.803    | 14.29 GB    | 512            | 2                | bfloat16  | 44.69 min (1xA10G) |
 |                                   |      |           |        |          |             |                |                  |           |                    |
diff --git a/config_hub/finetune/llama-3-8b/full.yaml b/config_hub/finetune/llama-3-8b/full.yaml
new file mode 100644
index 0000000000..11aebcb155
--- /dev/null
+++ b/config_hub/finetune/llama-3-8b/full.yaml
@@ -0,0 +1,95 @@
+
+# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
+checkpoint_dir: checkpoints/meta-llama/Meta-Llama-3-8B
+
+# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/finetune/full)
+out_dir: out/finetune/full-llama-3-8b
+
+# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
+precision: bf16-true
+
+# How many devices/GPUs to use (type: Union[int, str], default: 1)
+devices: 4
+
+# Path to a checkpoint directory to resume from in case training was interrupted, or ``True`` to resume
+# from the latest checkpoint in ``out_dir``. (type: Union[bool, Path], default: False)
+resume: false
+
+# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
+data:
+  class_path: litgpt.data.Alpaca2k
+  init_args:
+    mask_prompt: false
+    prompt_style: alpaca
+    ignore_index: -100
+    seed: 42
+    num_workers: 4
+
+# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
+train:
+
+  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
+  save_interval: 200
+
+  # Number of iterations between logging calls (type: int, default: 1)
+  log_interval: 1
+
+  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 64)
+  global_batch_size: 64
+
+  # Number of samples per data-parallel rank (type: int, default: 1)
+  micro_batch_size: 4
+
+  # Number of iterations with learning rate warmup active (type: int, default: 100)
+  lr_warmup_steps: 25
+
+  # Number of epochs to train on (type: Optional[int], default: 5)
+  epochs: 1
+
+  # Total number of tokens to train on (type: Optional[int], default: null)
+  max_tokens:
+
+  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
+  max_steps:
+
+  # Limits the length of samples. Off by default (type: Optional[int], default: null)
+  max_seq_length: 512
+
+  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
+  tie_embeddings:
+
+  #   (type: float, default: 0.003)
+  learning_rate: 0.0002
+
+  #   (type: float, default: 0.02)
+  weight_decay: 0.1
+
+  #   (type: float, default: 0.9)
+  beta1: 0.9
+
+  #   (type: float, default: 0.95)
+  beta2: 0.95
+
+  #   (type: Optional[float], default: null)
+  max_norm:
+
+  #   (type: float, default: 6e-05)
+  min_lr: 6.0e-05
+
+# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
+eval:
+
+  # Number of optimizer steps between evaluation calls (type: int, default: 600)
+  interval: 25
+
+  # Number of tokens to generate (type: Optional[int], default: 100)
+  max_new_tokens: 100
+
+  # Number of iterations (type: int, default: 100)
+  max_iters: 100
+
+# The name of the logger to send metrics to. (type: Literal['wandb', 'tensorboard', 'csv'], default: csv)
+logger_name: csv
+
+# The random seed to use for reproducibility. (type: int, default: 1337)
+seed: 1337
diff --git a/config_hub/finetune/llama-3-8b/lora.yaml b/config_hub/finetune/llama-3-8b/lora.yaml
new file mode 100644
index 0000000000..700a3b62f4
--- /dev/null
+++ b/config_hub/finetune/llama-3-8b/lora.yaml
@@ -0,0 +1,121 @@
+
+# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
+checkpoint_dir: checkpoints/meta-llama/Meta-Llama-3-8B
+
+# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
+out_dir: out/finetune/lora-llama-3-8b
+
+# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
+precision: bf16-true
+
+# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
+quantize:
+
+# How many devices/GPUs to use. (type: Union[int, str], default: 1)
+devices: 1
+
+# The LoRA rank. (type: int, default: 8)
+lora_r: 32
+
+# The LoRA alpha. (type: int, default: 16)
+lora_alpha: 16
+
+# The LoRA dropout value. (type: float, default: 0.05)
+lora_dropout: 0.05
+
+# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
+lora_query: true
+
+# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
+lora_key: false
+
+# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
+lora_value: true
+
+# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
+lora_projection: false
+
+# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
+lora_mlp: false
+
+# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
+lora_head: false
+
+# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
+data:
+  class_path: litgpt.data.Alpaca2k
+  init_args:
+    mask_prompt: false
+    prompt_style: alpaca
+    ignore_index: -100
+    seed: 42
+    num_workers: 4
+
+# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
+train:
+
+  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
+  save_interval: 200
+
+  # Number of iterations between logging calls (type: int, default: 1)
+  log_interval: 1
+
+  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
+  global_batch_size: 8
+
+  # Number of samples per data-parallel rank (type: int, default: 4)
+  micro_batch_size: 1
+
+  # Number of iterations with learning rate warmup active (type: int, default: 100)
+  lr_warmup_steps: 10
+
+  # Number of epochs to train on (type: Optional[int], default: 5)
+  epochs: 2
+
+  # Total number of tokens to train on (type: Optional[int], default: null)
+  max_tokens:
+
+  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
+  max_steps:
+
+  # Limits the length of samples. Off by default (type: Optional[int], default: null)
+  max_seq_length: 512
+
+  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
+  tie_embeddings:
+
+  #   (type: float, default: 0.0003)
+  learning_rate: 0.0002
+
+  #   (type: float, default: 0.02)
+  weight_decay: 0.0
+
+  #   (type: float, default: 0.9)
+  beta1: 0.9
+
+  #   (type: float, default: 0.95)
+  beta2: 0.95
+
+  #   (type: Optional[float], default: null)
+  max_norm:
+
+  #   (type: float, default: 6e-05)
+  min_lr: 6.0e-05
+
+# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
+eval:
+
+  # Number of optimizer steps between evaluation calls (type: int, default: 100)
+  interval: 100
+
+  # Number of tokens to generate (type: Optional[int], default: 100)
+  max_new_tokens: 100
+
+  # Number of iterations (type: int, default: 100)
+  max_iters: 100
+
+# The name of the logger to send metrics to. (type: Literal['wandb', 'tensorboard', 'csv'], default: csv)
+logger_name: csv
+
+# The random seed to use for reproducibility. (type: int, default: 1337)
+seed: 1337
diff --git a/config_hub/finetune/llama-3-8b/qlora.yaml b/config_hub/finetune/llama-3-8b/qlora.yaml
new file mode 100644
index 0000000000..1da95eaac5
--- /dev/null
+++ b/config_hub/finetune/llama-3-8b/qlora.yaml
@@ -0,0 +1,123 @@
+
+# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
+checkpoint_dir: checkpoints/meta-llama/Meta-Llama-3-8B
+
+# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
+out_dir: out/finetune/qlora-llama3-8b
+
+# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
+precision: bf16-true
+
+# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
+quantize: bnb.nf4
+
+# How many devices/GPUs to use. (type: Union[int, str], default: 1)
+devices: 1
+
+# The LoRA rank. (type: int, default: 8)
+lora_r: 32
+
+# The LoRA alpha. (type: int, default: 16)
+lora_alpha: 16
+
+# The LoRA dropout value. (type: float, default: 0.05)
+lora_dropout: 0.05
+
+# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
+lora_query: true
+
+# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
+lora_key: false
+
+# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
+lora_value: true
+
+# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
+lora_projection: false
+
+# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
+lora_mlp: false
+
+# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
+lora_head: false
+
+# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
+data:
+  class_path: litgpt.data.Alpaca2k
+  init_args:
+    mask_prompt: false
+    val_split_fraction: 0.05
+    prompt_style: alpaca
+    ignore_index: -100
+    seed: 42
+    num_workers: 4
+    download_dir: data/alpaca2k
+
+# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
+train:
+
+  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
+  save_interval: 200
+
+  # Number of iterations between logging calls (type: int, default: 1)
+  log_interval: 1
+
+  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
+  global_batch_size: 8
+
+  # Number of samples per data-parallel rank (type: int, default: 4)
+  micro_batch_size: 2
+
+  # Number of iterations with learning rate warmup active (type: int, default: 100)
+  lr_warmup_steps: 10
+
+  # Number of epochs to train on (type: Optional[int], default: 5)
+  epochs: 2
+
+  # Total number of tokens to train on (type: Optional[int], default: null)
+  max_tokens:
+
+  # Limits the number of optimizer steps to run (type: Optional[int], default: null)
+  max_steps:
+
+  # Limits the length of samples (type: Optional[int], default: null)
+  max_seq_length: 512
+
+  # Whether to tie the embedding weights with the language modeling head weights (type: Optional[bool], default: null)
+  tie_embeddings:
+
+  #   (type: float, default: 0.0003)
+  learning_rate: 0.0002
+
+  #   (type: float, default: 0.02)
+  weight_decay: 0.0
+
+  #   (type: float, default: 0.9)
+  beta1: 0.9
+
+  #   (type: float, default: 0.95)
+  beta2: 0.95
+
+  #   (type: Optional[float], default: null)
+  max_norm:
+
+  #   (type: float, default: 6e-05)
+  min_lr: 6.0e-05
+
+# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
+eval:
+
+  # Number of optimizer steps between evaluation calls (type: int, default: 100)
+  interval: 100
+
+  # Number of tokens to generate (type: Optional[int], default: 100)
+  max_new_tokens: 100
+
+  # Number of iterations (type: int, default: 100)
+  max_iters: 100
+
+# The name of the logger to send metrics to. (type: Literal['wandb', 'tensorboard', 'csv'], default: csv)
+logger_name: csv
+
+# The random seed to use for reproducibility. (type: int, default: 1337)
+seed: 1337
diff --git a/litgpt/__main__.py b/litgpt/__main__.py
index 59d53ac904..821c1f5801 100644
--- a/litgpt/__main__.py
+++ b/litgpt/__main__.py
@@ -1,4 +1,5 @@
 # Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.
+import sys
 
 from typing import TYPE_CHECKING, Any
 
@@ -24,6 +25,8 @@
 from litgpt.scripts.download import download_from_hub as download_fn
 from litgpt.scripts.merge_lora import merge_lora as merge_lora_fn
 from litgpt.eval.evaluate import convert_and_evaluate as evaluate_fn
+from litgpt.deploy.serve import run_server as serve_fn
+
 
 if TYPE_CHECKING:
     from jsonargparse import ArgumentParser
@@ -39,6 +42,12 @@ def _new_parser(**kwargs: Any) -> "ArgumentParser":
     return parser
 
 
+def _rewrite_argv_for_default_subcommand(parser_data: dict, command: str, subcommand: str) -> None:
+    """Rewrites the `sys.argv` such that `litgpt command` defaults to `litgpt command subcommand`."""
+    if len(sys.argv) > 2 and sys.argv[1] == command and sys.argv[2] not in parser_data[command].keys():
+        sys.argv.insert(2, subcommand)
+
+
 def main() -> None:
     parser_data = {
         "download": {"help": "Download weights or tokenizer data from the Hugging Face Hub.", "fn": download_fn},
@@ -80,6 +89,7 @@ def main() -> None:
         },
         "merge_lora": {"help": "Merges the LoRA weights with the base model.", "fn": merge_lora_fn},
         "evaluate": {"help": "Evaluate a model with the LM Evaluation Harness.", "fn": evaluate_fn},
+        "serve": {"help": "Serve and deploy a model with LitServe.", "fn": serve_fn},
     }
 
     from jsonargparse import set_config_read_mode, set_docstring_parse_options
@@ -87,6 +97,8 @@ def main() -> None:
     set_docstring_parse_options(attribute_docstrings=True)
     set_config_read_mode(urls_enabled=True)
 
+    _rewrite_argv_for_default_subcommand(parser_data, "finetune", "lora")
+
     root_parser = _new_parser(prog="litgpt")
 
     # register level 1 subcommands and level 2 subsubcommands. If there are more levels in the future we would want to
diff --git a/litgpt/config.py b/litgpt/config.py
index 0a4234222d..e03fa8ae34 100644
--- a/litgpt/config.py
+++ b/litgpt/config.py
@@ -836,6 +836,56 @@ def norm_class(self) -> Type:
         copy["name"] = c["name"].format(kind)
         copy["hf_config"]["name"] = c["hf_config"]["name"].format(kind)
         configs.append(copy)
+        
+
+###############
+# Meta LLaMA 3
+###############
+llama_3 = [
+    # https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/config.json
+    dict(
+        name="Llama-3-8B{}",
+        hf_config=dict(org="meta-llama", name="Meta-Llama-3-8B{}"),
+        block_size=8192,
+        vocab_size=128000,
+        padded_vocab_size=128256,
+        n_layer=32,
+        n_head=32,
+        n_query_groups=8,
+        rotary_percentage=1.0,
+        parallel_residual=False,
+        bias=False,
+        norm_class_name="RMSNorm",
+        mlp_class_name="LLaMAMLP",
+        intermediate_size=14336,
+        rope_base=500000,
+    ),
+    # https://huggingface.co/meta-llama/Meta-Llama-3-70B/blob/main/config.json
+    dict(
+        name="Llama-3-70B{}",
+        hf_config=dict(org="meta-llama", name="Meta-Llama-3-70B{}"),
+        block_size=8192,
+        vocab_size=128000,
+        padded_vocab_size=128256,
+        n_layer=80,
+        n_head=64,
+        n_embd=8192,
+        n_query_groups=8,
+        rotary_percentage=1.0,
+        parallel_residual=False,
+        bias=False,
+        norm_class_name="RMSNorm",
+        mlp_class_name="LLaMAMLP",
+        intermediate_size=28672,
+        rope_base=500000,
+    ),
+]
+for c in llama_3:
+    for kind in ("", "-Instruct"):
+        copy = deepcopy(c)
+        copy["name"] = c["name"].format(kind)
+        copy["hf_config"]["name"] = c["hf_config"]["name"].format(kind)
+        configs.append(copy)
 
 
 ###############
diff --git a/litgpt/deploy/serve.py b/litgpt/deploy/serve.py
new file mode 100644
index 0000000000..9df48ad98d
--- /dev/null
+++ b/litgpt/deploy/serve.py
@@ -0,0 +1,137 @@
+# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.
+from pathlib import Path
+from typing import Dict, Any, Optional, Literal
+from litgpt.utils import check_valid_checkpoint_dir
+
+import lightning as L
+import torch
+from litserve import LitAPI, LitServer
+
+from litgpt.model import GPT
+from litgpt.config import Config
+from litgpt.tokenizer import Tokenizer
+from litgpt.generate.base import generate
+from litgpt.prompts import load_prompt_style, has_prompt_style, PromptStyle
+from litgpt.utils import load_checkpoint, CLI, get_default_supported_precision
+
+
+class SimpleLitAPI(LitAPI):
+    def __init__(self,
+                 checkpoint_dir: Path,
+                 precision: Optional[str] = None,
+                 temperature: float = 0.8,
+                 top_k: int = 50,
+                 max_new_tokens: int = 50) -> None:
+
+        super().__init__()
+        self.checkpoint_dir = checkpoint_dir
+        self.precision = precision
+        self.temperature = temperature
+        self.top_k = top_k
+        self.max_new_tokens = max_new_tokens
+
+    def setup(self, device: str) -> None:
+        # Setup the model so it can be called in `predict`.
+        config = Config.from_file(self.checkpoint_dir / "model_config.yaml")
+        device = torch.device(device)
+        torch.set_float32_matmul_precision("high")
+
+        precision = self.precision or get_default_supported_precision(training=False)
+
+        fabric = L.Fabric(
+            accelerator=device.type,
+            devices=1 if device.type=="cpu" else [device.index], # TODO: Update once LitServe supports "auto"
+            precision=precision,
+        )
+        checkpoint_path = self.checkpoint_dir / "lit_model.pth"
+        self.tokenizer = Tokenizer(self.checkpoint_dir)
+        self.prompt_style = (
+            load_prompt_style(self.checkpoint_dir)
+            if has_prompt_style(self.checkpoint_dir)
+            else PromptStyle.from_config(config)
+        )
+        with fabric.init_module(empty_init=True):
+            model = GPT(config)
+        with fabric.init_tensor():
+            # enable the kv cache
+            model.set_kv_cache(batch_size=1)
+        model.eval()
+
+        self.model = fabric.setup_module(model)
+        load_checkpoint(fabric, self.model, checkpoint_path)
+        self.device = fabric.device
+
+    def decode_request(self, request: Dict[str, Any]) -> Any:
+        # Convert the request payload to your model input.
+        prompt = request["prompt"]
+        prompt = self.prompt_style.apply(prompt)
+        encoded = self.tokenizer.encode(prompt, device=self.device)
+        return encoded
+
+    def predict(self, inputs: torch.Tensor) -> Any:
+        # Run the model on the input and return the output.
+        prompt_length = inputs.size(0)
+        max_returned_tokens = prompt_length + self.max_new_tokens
+
+        y = generate(
+            self.model,
+            inputs,
+            max_returned_tokens,
+            temperature=self.temperature,
+            top_k=self.top_k,
+            eos_id=self.tokenizer.eos_id
+        )
+
+        for block in self.model.transformer.h:
+            block.attn.kv_cache.reset_parameters()
+        return y
+
+    def encode_response(self, output: torch.Tensor) -> Dict[str, Any]:
+        # Convert the model output to a response payload.
+        decoded_output = self.tokenizer.decode(output)
+        return {"output": decoded_output}
+
+
+def run_server(
+    checkpoint_dir: Path = Path("checkpoints"),
+    precision: Optional[str] = None,
+    temperature: float = 0.8,
+    top_k: int = 200,
+    max_new_tokens: int = 50,
+    devices: int = 1,
+    accelerator: str = "cuda",
+    port: int = 8000
+) -> None:
+    """Serve a LitGPT model using LitServe
+
+    Arguments:
+        checkpoint_dir: The checkpoint directory to load the model from.
+        precision: Optional precision setting to instantiate the model weights in. By default, this will
+            automatically be inferred from the metadata in the given ``checkpoint_dir`` directory.
+        temperature: Temperature setting for the text generation. Value above 1 increase randomness.
+            Values below 1 decrease randomness.
+        top_k: The size of the pool of potential next tokens. Values larger than 1 result in more novel
+            generated text but can also lead to more incoherent texts.
+        max_new_tokens: The number of generation steps to take.
+        devices: How many devices/GPUs to use.
+        accelerator: The type of accelerator to use. For example, "cuda" or "cpu".
+        port: The network port number on which the model is configured to be served.
+    """
+    check_valid_checkpoint_dir(checkpoint_dir, model_filename="lit_model.pth")
+
+    server = LitServer(
+        SimpleLitAPI(
+            checkpoint_dir=checkpoint_dir,
+            precision=precision,
+            temperature=temperature,
+            top_k=top_k,
+            max_new_tokens=max_new_tokens,
+            ),
+        accelerator=accelerator,
+        devices=devices)
+
+    server.run(port=port)
+
+
+if __name__ == "__main__":
+    CLI(run_server)
diff --git a/litgpt/prompts.py b/litgpt/prompts.py
index d827413913..04a0551cd1 100644
--- a/litgpt/prompts.py
+++ b/litgpt/prompts.py
@@ -200,6 +200,24 @@ def apply(self, prompt: str, **kwargs: str) -> str:
         )
 
 
+class Llama3(PromptStyle):
+    def apply(self, prompt: str, **kwargs: str) -> str:
+        # https://github.com/meta-llama/llama3/blob/359887376f0aaf30e433f23e25df858d8c2a9833/llama/tokenizer.py#L202-L229
+        return (
+            "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n"
+            "You are a helpful assistant.<|eot_id|>\n"  # The system prompt is optional
+            "<|start_header_id|>user<|end_header_id|>\n\n"
+            f"{prompt}<|eot_id|>\n"
+            "<|start_header_id|>assistant<|end_header_id|>\n\n"
+        )
+
+    def stop_tokens(self, tokenizer: "Tokenizer") -> Tuple[List[int], ...]:
+        return (
+            [tokenizer.eos_id],
+            [tokenizer.token_to_id("<|eot_id|>")],
+        )
+
+
 class FreeWilly2(PromptStyle):
     def apply(self, prompt: str, **kwargs: str) -> str:
         return (
@@ -316,6 +334,8 @@ def model_name_to_prompt_style(model_name: str) -> PromptStyle:
         return Llama2FunctionCalling()
     if re.search("Llama-2.*-chat", model_name):
         return Llama2()
+    if re.search("Llama-3.*-Instruct", model_name):
+        return Llama3()
     if re.search("FreeWilly2", model_name):
         return FreeWilly2()
     if re.search("Platypus", model_name):
diff --git a/pyproject.toml b/pyproject.toml
index 3b22a124ea..d2c26fc33e 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -12,6 +12,7 @@ dependencies = [
     "torch>=2.2.0",
     "lightning==2.3.0.dev20240328",
     "jsonargparse[signatures]>=4.27.6",
+    "litserve==0.0.0.dev2",   # imported by litgpt.deploy
 ]
 
 [project.urls]
diff --git a/tests/test_cli.py b/tests/test_cli.py
index 2c994fcf96..f95841ddc0 100644
--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@@ -15,7 +15,7 @@ def test_cli():
         main()
     out = out.getvalue()
     assert "usage: litgpt" in out
-    assert "{download,chat,finetune,pretrain,generate,convert,merge_lora,evaluate}" in out
+    assert "{download,chat,finetune,pretrain,generate,convert,merge_lora,evaluate,serve}" in out
     assert (
         """Available subcommands:
     download            Download weights or tokenizer data from the Hugging
@@ -23,19 +23,8 @@ def test_cli():
     chat                Chat with a model."""
         in out
     )
-    assert ("""evaluate            Evaluate a model with the LM Evaluation Harness.""") in out
-
-    out = StringIO()
-    with pytest.raises(SystemExit), redirect_stdout(out), mock.patch("sys.argv", ["litgpt", "finetune", "-h"]):
-        main()
-    out = out.getvalue()
-    assert (
-        """Available subcommands:
-    lora                Finetune a model with LoRA.
-    full                Finetune a model."""
-        in out
-    )
-
+    assert """evaluate            Evaluate a model with the LM Evaluation Harness.""" in out
+    assert """serve               Serve and deploy a model with LitServe.""" in out
     out = StringIO()
     with pytest.raises(SystemExit), redirect_stdout(out), mock.patch("sys.argv", ["litgpt", "finetune", "lora", "-h"]):
         main()
@@ -61,3 +50,13 @@ def test_cli():
                         Optional[int], default: 3000000000000)"""
         in out
     )
+
+
+def test_rewrite_finetune_command():
+    out1 = StringIO()
+    with pytest.raises(SystemExit), redirect_stdout(out1), mock.patch("sys.argv", ["litgpt", "fineune", "-h"]):
+        main()
+    out2 = StringIO()
+    with pytest.raises(SystemExit), redirect_stdout(out2), mock.patch("sys.argv", ["litgpt", "fineune", "lora", "-h"]):
+        main()
+    assert out1.getvalue() == out2.getvalue()
diff --git a/tests/test_model.py b/tests/test_model.py
index 7bc0ccb5b4..0537098342 100644
--- a/tests/test_model.py
+++ b/tests/test_model.py
@@ -206,7 +206,13 @@ def test_against_original_open_llama_3b(device, dtype):
 @torch.inference_mode()
 @pytest.mark.parametrize(
     "ours_kwargs",
-    [{"name": "Llama-2-7b-hf"}, {"name": "CodeLlama-7b-hf"}, {"name": "Llama-2-70b-chat-hf", "n_query_groups": 1}],
+    [
+        {"name": "Llama-2-7b-hf"}, 
+        {"name": "CodeLlama-7b-hf"}, 
+        {"name": "Llama-2-70b-chat-hf", "n_query_groups": 1}, 
+        {"name": "Llama-3-8B"}, 
+        {"name": "Llama-3-8B-Instruct"}
+    ],
 )
 @pytest.mark.parametrize(
     ("device", "dtype"),
@@ -224,7 +230,7 @@ def test_against_original_open_llama_3b(device, dtype):
         ),
     ],
 )
-def test_against_hf_llama2(ours_kwargs, device, dtype):
+def test_against_hf_llama_2_and_3(ours_kwargs, device, dtype):
     torch.set_default_dtype(dtype)
 
     ours_config = Config.from_name(
diff --git a/tests/test_prompts.py b/tests/test_prompts.py
index 3250ce4801..20f2c84e0c 100644
--- a/tests/test_prompts.py
+++ b/tests/test_prompts.py
@@ -50,6 +50,8 @@ def test_prompt_style_from_config():
         "Llama-2-7b-chat-hf",
         "Llama-2-13b-chat-hf",
         "Llama-2-70b-chat-hf",
+        "Llama-3-8B-Instruct",
+        "Llama-3-70B-Instruct",
         "Gemma-2b-it",
         "Gemma-7b-it",
         "FreeWilly2",
diff --git a/tests/test_serve.py b/tests/test_serve.py
new file mode 100644
index 0000000000..46a109c807
--- /dev/null
+++ b/tests/test_serve.py
@@ -0,0 +1,42 @@
+# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.
+from dataclasses import asdict
+import shutil
+
+from lightning.fabric import seed_everything
+from fastapi.testclient import TestClient
+from litserve.server import LitServer
+import torch
+import yaml
+
+
+from litgpt import GPT, Config
+from litgpt.deploy.serve import SimpleLitAPI
+from litgpt.scripts.download import download_from_hub
+
+
+def test_simple(tmp_path):
+
+    # Create model checkpoint
+    seed_everything(123)
+    ours_config = Config.from_name("pythia-14m")
+    download_from_hub(repo_id="EleutherAI/pythia-14m", tokenizer_only=True, checkpoint_dir=tmp_path)
+    shutil.move(str(tmp_path / "EleutherAI" / "pythia-14m" / "tokenizer.json"), str(tmp_path))
+    shutil.move(str(tmp_path / "EleutherAI" / "pythia-14m" / "tokenizer_config.json"), str(tmp_path))
+    ours_model = GPT(ours_config)
+    checkpoint_path = tmp_path / "lit_model.pth"
+    torch.save(ours_model.state_dict(), checkpoint_path)
+    config_path = tmp_path / "model_config.yaml"
+    with open(config_path, "w", encoding="utf-8") as fp:
+        yaml.dump(asdict(ours_config), fp)
+
+    accelerator = "cpu"
+    server = LitServer(
+        SimpleLitAPI(checkpoint_dir=tmp_path, temperature=1, top_k=1),
+        accelerator=accelerator, devices=1, timeout=60
+        )
+
+    with TestClient(server.app) as client:
+        response = client.post("/predict", json={"prompt": "Hello world"})
+        # Model is a small random model, not trained, hence the gibberish.
+        # We are just testing that the server works.
+        assert response.json()["output"][:19] == "Hello world statues"
diff --git a/tutorials/0_to_litgpt.md b/tutorials/0_to_litgpt.md
index 337bf37049..e5e1c7c765 100644
--- a/tutorials/0_to_litgpt.md
+++ b/tutorials/0_to_litgpt.md
@@ -464,6 +464,44 @@ litgpt evaluate \
 (A list of supported tasks can be found [here](https://github.com/EleutherAI/lm-evaluation-harness/blob/master/docs/task_table.md).)
 
 
+&nbsp;
+## Deploy LLMs
+
+You can deploy LitGPT LLMs using your tool of choice. Below is an example using LitGPT built-in serving capabilities:
+
+
+```bash
+# 1) Download a pretrained model (alternatively, use your own finetuned model)
+litgpt download --repo_id microsoft/phi-2
+
+# 2) Start the server
+litgpt serve --checkpoint_dir checkpoints/microsoft/phi-2
+```
+
+```python
+# 3) Use the server (in a separate session)
+import requests, json
+ response = requests.post(
+     "http://127.0.0.1:8000/predict", 
+     json={"prompt": "Fix typos in the following sentence: Exampel input"}
+)
+print(response.json()["output"])
+```
+
+This prints:
+
+```
+Instruct: Fix typos in the following sentence: Exampel input
+Output: Example input.
+```
+
+
+&nbsp;
+**More information and additional resources**
+
+- [tutorials/deploy](deploy.md): A full deployment tutorial and example
+
+
 &nbsp;
 ## Converting LitGPT model weights to `safetensors` format
 
diff --git a/tutorials/deploy.md b/tutorials/deploy.md
new file mode 100644
index 0000000000..1b1495fde7
--- /dev/null
+++ b/tutorials/deploy.md
@@ -0,0 +1,49 @@
+# Serve and Deploy LLMs
+
+This document shows how you can serve a LitGPT for deployment. 
+
+&nbsp;
+## Serve an LLM
+
+This section illustrates how we can set up an inference server for a phi-2 LLM using `litgpt serve` that is minimal and highly scalable.
+
+
+&nbsp;
+## Step 1: Start the inference server
+
+
+```bash
+# 1) Download a pretrained model (alternatively, use your own finetuned model)
+litgpt download --repo_id microsoft/phi-2
+
+# 2) Start the server
+litgpt serve --checkpoint_dir checkpoints/microsoft/phi-2
+```
+
+> [!TIP]
+> Use `litgpt serve --help` to display additional options, including the port, devices, LLM temperature setting, and more.
+
+
+&nbsp;
+## Step 2: Query the inference server
+
+You can now send requests to the inference server you started in step 2. For example, in a new Python session, we can send requests to the inference server as follows:
+
+
+```python
+import requests, json
+
+response = requests.post(
+    "http://127.0.0.1:8000/predict", 
+    json={"prompt": "Fix typos in the following sentence: Exampel input"}
+)
+
+print(response.json()["output"])
+```
+
+Executing the code above prints the following output:
+
+```
+Instruct: Fix typos in the following sentence: Exampel input
+Output: Example input.
+```
diff --git a/tutorials/download_model_weights.md b/tutorials/download_model_weights.md
index d1c320ac33..45c9c7d50c 100644
--- a/tutorials/download_model_weights.md
+++ b/tutorials/download_model_weights.md
@@ -3,29 +3,30 @@
 LitGPT supports a variety of LLM architectures with publicly available weights. You can download model weights and access a list of supported models using the LitGPT `download.py` script.
 
 
-| Model                                        | Model size                               | Reference                                                                                                                    |
-|----------------------------------------------|------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
-| CodeGemma by Google                          | 7B                                       | [Google Team, Google Deepmind](https://ai.google.dev/gemma/docs/codegemma)                                                                      |
-| Code Llama by Meta AI                        | 7B, 13B, 34B, 70B                        | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950)                                                                      |
-| Dolly by Databricks                          | 3B, 7B, 12B                              | [Conover et al. 2023](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm) |
-| Falcon by TII UAE                            | 7B, 40B, 180B                            | [TII 2023](https://falconllm.tii.ae)                                                                                         |
-| FreeWilly2 (Stable Beluga 2) by Stability AI | 70B                                      | [Stability AI 2023](https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models)                             |
-| Function Calling Llama 2 by Trelis           | 7B                                       | [Trelis et al. 2023](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2)                                   |
-| Gemma by Google                              | 2B, 7B                                   | [Google Team, Google Deepmind](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf)                         |
-| Llama 2 by Meta AI                           | 7B, 13B, 70B                             | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288)                                                                      |
-| LongChat by LMSYS                            | 7B, 13B                                  | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/)                                                            |
-| Mistral and Mixtral by Mistral AI            | 7B                                       | [Mistral website](https://mistral.ai/)                                                                                       |
-| Nous-Hermes by NousResearch                  | 7B, 13B, 70B                             | [Org page](https://huggingface.co/NousResearch)                                                                              |
-| OpenLLaMA by OpenLM Research                 | 3B, 7B, 13B                              | [Geng & Liu 2023](https://github.com/openlm-research/open_llama)                                                             |
-| Phi by Microsoft Research                    | 1.3B, 2.7B                               | [Li et al. 2023](https://arxiv.org/abs/2309.05463)                                                                           |
-| Platypus by Lee at el.                       | 7B, 13B, 70B                             | [Lee, Hunter, and Ruiz 2023](https://arxiv.org/abs/2308.07317)                                                               |
-| Pythia by EleutherAI                         | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373)                                                                     |
-| RedPajama-INCITE by Together                 | 3B, 7B                                   | [Together 2023](https://together.ai/blog/redpajama-models-v1)                                                                |
-| StableCode by Stability AI                   | 3B                                       | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)                                           |
-| StableLM by Stability AI                     | 3B, 7B                                   | [Stability AI 2023](https://github.com/Stability-AI/StableLM)                                                                |
-| StableLM Zephyr by Stability AI              | 3B                                       | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)                                           |
-| TinyLlama by Zhang et al.                    | 1.1B                                     | [Zhang et al. 2023](https://github.com/jzhang38/TinyLlama)                                                                   |
-| Vicuna by LMSYS                              | 7B, 13B, 33B                             | [Li et al. 2023](https://lmsys.org/blog/2023-03-30-vicuna/)                                                                  |
+| Model                                        | Model size                              | Reference                                                                                                                |
+|----------------------------------------------|-----------------------------------------|--------------------------------------------------------------------------------------------------------------------------|
+| CodeGemma by Google                          | 7B                                      | [Google Team, Google Deepmind](https://ai.google.dev/gemma/docs/codegemma)                                                             |
+| Code Llama by Meta AI                        | 7B, 13B, 34B, 70B                       | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950)                                                                  |
+| Dolly by Databricks                          | 3B, 7B, 12B                             | [Conover et al. 2023](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm) |
+| Falcon by TII UAE                            | 7B, 40B, 180B                           | [TII 2023](https://falconllm.tii.ae)                                                                                     |
+| FreeWilly2 (Stable Beluga 2) by Stability AI | 70B                                     | [Stability AI 2023](https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models)                         |
+| Function Calling Llama 2 by Trelis           | 7B                                      | [Trelis et al. 2023](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2)                               |
+| Gemma by Google                              | 2B, 7B                                  | [Google Team, Google Deepmind](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf)                     |
+| Llama 2 by Meta AI                           | 7B, 13B, 70B                            | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288)                                                                  |
+| Llama 3 by Meta AI                           | 8B, 70B                                 | [Meta AI 2024](https://github.com/meta-llama/llama3)                                                                     |
+| LongChat by LMSYS                            | 7B, 13B                                 | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/)                                                        |
+| Mistral and Mixtral by Mistral AI            | 7B                                      | [Mistral website](https://mistral.ai/)                                                                                   |
+| Nous-Hermes by NousResearch                  | 7B, 13B, 70B                            | [Org page](https://huggingface.co/NousResearch)                                                                          |
+| OpenLLaMA by OpenLM Research                 | 3B, 7B, 13B                             | [Geng & Liu 2023](https://github.com/openlm-research/open_llama)                                                         |
+| Phi by Microsoft Research                    | 1.3B, 2.7B                              | [Li et al. 2023](https://arxiv.org/abs/2309.05463)                                                                       |
+| Platypus by Lee at el.                       | 7B, 13B, 70B                            | [Lee, Hunter, and Ruiz 2023](https://arxiv.org/abs/2308.07317)                                                           |
+| Pythia by EleutherAI                         | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373)                                                                 |
+| RedPajama-INCITE by Together                 | 3B, 7B                                  | [Together 2023](https://together.ai/blog/redpajama-models-v1)                                                            |
+| StableCode by Stability AI                   | 3B                                      | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)                                       |
+| StableLM by Stability AI                     | 3B, 7B                                  | [Stability AI 2023](https://github.com/Stability-AI/StableLM)                                                            |
+| StableLM Zephyr by Stability AI              | 3B                                      | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)                                       |
+| TinyLlama by Zhang et al.                    | 1.1B                                    | [Zhang et al. 2023](https://github.com/jzhang38/TinyLlama)                                                               |
+| Vicuna by LMSYS                              | 7B, 13B, 33B                            | [Li et al. 2023](https://lmsys.org/blog/2023-03-30-vicuna/)                                                              |
 
 
 
@@ -105,6 +106,10 @@ meta-llama/Llama-2-70b-chat-hf
 meta-llama/Llama-2-70b-hf
 meta-llama/Llama-2-7b-chat-hf
 meta-llama/Llama-2-7b-hf
+meta-llama/Meta-Llama-3-70B
+meta-llama/Meta-Llama-3-70B-Instruct
+meta-llama/Meta-Llama-3-8B
+meta-llama/Meta-Llama-3-8B-Instruct
 microsoft/phi-1_5
 microsoft/phi-2
 mistralai/Mistral-7B-Instruct-v0.1