From 57991dc091ff6b40f3a2bef2d72e2310f9e08adc Mon Sep 17 00:00:00 2001
From: rasbt <mail@sebastianraschka.com>
Date: Mon, 15 Apr 2024 21:48:01 +0000
Subject: [PATCH] add docs

---
 README.md                | 24 ++++++++++++++++++-
 tutorials/0_to_litgpt.md | 37 +++++++++++++++++++++++++++++
 tutorials/deploy.md      | 51 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 111 insertions(+), 1 deletion(-)
 create mode 100644 tutorials/deploy.md

diff --git a/README.md b/README.md
index eaa5fbf7f5..ed5a4fb0dd 100644
--- a/README.md
+++ b/README.md
@@ -144,7 +144,7 @@ litgpt chat \
 ### Continue pretraining an LLM       
 This is another way of finetuning that specialize an already pretrained model by training on custom data:    
 
-```
+```bash
 mkdir -p custom_texts
 curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_texts/book1.txt
 curl https://www.gutenberg.org/cache/epub/26393/pg26393.txt --output custom_texts/book2.txt
@@ -166,6 +166,28 @@ litgpt chat \
   --checkpoint_dir out/custom-model/final
 ```
 
+### Deploy an LLM
+
+This example illustrates how to deploy an LLM using LitGPT
+
+```bash
+# 1) Download a pretrained model (alternatively, use your own finetuned model)
+litgpt download --repo_id microsoft/phi-2
+
+# 2) Start the server
+litgpt serve --checkpoint_dir checkpoints/microsoft/phi-2
+```
+
+```python
+# 3) Use the server (in a separate session)
+import requests, json
+ response = requests.post(
+     "http://127.0.0.1:8000/predict", 
+     json={"prompt": "Fix typos in the following sentence: Exampel input"}
+)
+print(response.content)
+```
+
 &nbsp;
 
 > [!NOTE]
diff --git a/tutorials/0_to_litgpt.md b/tutorials/0_to_litgpt.md
index 337bf37049..8e4e6e1902 100644
--- a/tutorials/0_to_litgpt.md
+++ b/tutorials/0_to_litgpt.md
@@ -464,6 +464,43 @@ litgpt evaluate \
 (A list of supported tasks can be found [here](https://github.com/EleutherAI/lm-evaluation-harness/blob/master/docs/task_table.md).)
 
 
+&nbsp;
+## Deploy LLMs
+
+You can deploy LitGPT LLMs using your tool of choice. Below is an example using LitGPT built-in serving capabilities:
+
+
+```bash
+# 1) Download a pretrained model (alternatively, use your own finetuned model)
+litgpt download --repo_id microsoft/phi-2
+
+# 2) Start the server
+litgpt serve --checkpoint_dir checkpoints/microsoft/phi-2
+```
+
+```python
+# 3) Use the server (in a separate session)
+import requests, json
+ response = requests.post(
+     "http://127.0.0.1:8000/predict", 
+     json={"prompt": "Fix typos in the following sentence: Exampel input"}
+)
+print(response.content)
+```
+
+This prints:
+
+```
+b'{"output":"Instruct: Fix typos in the following sentence: Exampel input\\nOutput: Example input: Hello World\\n"}'
+```
+
+
+&nbsp;
+**More information and additional resources**
+
+- [tutorials/deploy](deploy.md): A full deployment tutorial and example
+
+
 &nbsp;
 ## Converting LitGPT model weights to `safetensors` format
 
diff --git a/tutorials/deploy.md b/tutorials/deploy.md
new file mode 100644
index 0000000000..10ccb85580
--- /dev/null
+++ b/tutorials/deploy.md
@@ -0,0 +1,51 @@
+# Serve and Deploy LLMs
+
+This document shows how you can serve a LitGPT for deployment. 
+
+&nbsp;
+## Serve an LLM
+
+This section illustrates how we can set up an inference server for a phi-2 LLM using `litgpt serve` that is minimal and highly scalable.
+
+
+&nbsp;
+## Step 1: Start the inference server
+
+
+```bash
+# 1) Download a pretrained model (alternatively, use your own finetuned model)
+litgpt download --repo_id microsoft/phi-2
+
+# 2) Start the server
+litgpt serve --checkpoint_dir checkpoints/microsoft/phi-2
+```
+
+> [!TIP]
+> Use `litgpt serve --help` to display additional options, including the port, devices, LLM temperature setting, and more.
+
+
+&nbsp;
+## Step 2: Query the inference server
+
+You can now send requests to the inference server you started in step 2. For example, in a new Python session, we can send requests to the inference server as follows:
+
+
+```python
+import requests, json
+
+response = requests.post(
+    "http://127.0.0.1:8000/predict", 
+    json={"prompt": "Fix typos in the following sentence: Exampel input"}
+)
+
+decoded_string = response.content.decode("utf-8")
+output_str = json.loads(decoded_string)["output"]
+print(output_str)
+```
+
+Executing the code above prints the following output:
+
+```
+Instruct: Fix typos in the following sentence: Exampel input
+Output: Example input.
+```