Merge pull request #174 from janhq/tidy-Nitro

Tidy Nitro docs
janhq · Nov 22, 2023 · 5503006 · 5503006
2 parents 0110656 + f72cc2a
commit 5503006
Show file tree

Hide file tree

Showing 12 changed files with 119 additions and 9,925 deletions.
diff --git a/README.md b/README.md
@@ -17,11 +17,9 @@
 - Quick Setup: Approximately 10-second initialization for swift deployment.
 - Enhanced Web Framework: Incorporates drogon cpp to boost web service efficiency.
 
-## Documentation
-
 ## About Nitro
 
-Nitro is a light-weight integration layer (and soon to be inference engine) for cutting edge inference engine, make deployment of AI models easier than ever before!
+Nitro is a high-efficiency C++ inference engine for edge computing, powering [Jan](https://jan.ai/). It is lightweight and embeddable, ideal for product integration.
 
 The binary of nitro after zipped is only ~3mb in size with none to minimal dependencies (if you use a GPU need CUDA for example) make it desirable for any edge/server deployment 👍.
 
@@ -40,37 +38,57 @@ The binary of nitro after zipped is only ~3mb in size with none to minimal depen
 
 ## Quickstart
 
-**Step 1: Download Nitro**
+**Step 1: Install Nitro**
 
-To use Nitro, download the released binaries from the release page below:
+- For Linux and MacOS
 
-[![Download Nitro](https://img.shields.io/badge/Download-Nitro-blue.svg)](https://github.com/janhq/nitro/releases)
+  ```bash
+  curl -sfL https://raw.githubusercontent.com/janhq/nitro/main/install.sh | sudo /bin/bash -
+  ```
 
-After downloading the release, double-click on the Nitro binary.
+- For Windows
 
-**Step 2: Download a Model**
+  ```bash
+  powershell -Command "& { Invoke-WebRequest -Uri 'https://raw.githubusercontent.com/janhq/nitro/main/install.bat' -OutFile 'install.bat'; .\install.bat; Remove-Item -Path 'install.bat' }"
+  ```
 
-Download a llama model to try running the llama C++ integration. You can find a "GGUF" model on The Bloke's page below:
+**Step 2: Downloading a Model**
 
-[![Download Model](https://img.shields.io/badge/Download-Model-green.svg)](https://huggingface.co/TheBloke)
+```bash
+mkdir model && cd model
+wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf?download=true
+```
 
-**Step 3: Run Nitro**
+**Step 3: Run Nitro server**
 
-Double-click on Nitro to run it. After downloading your model, make sure it's saved to a specific path. Then, make an API call to load your model into Nitro.
+```bash title="Run Nitro server"
+nitro
+```
 
+**Step 4: Load model** 
 
-```zsh
-curl -X POST 'http://localhost:3928/inferences/llamacpp/loadmodel' \
+```bash title="Load model"
+curl http://localhost:3928/inferences/llamacpp/loadmodel \
   -H 'Content-Type: application/json' \
   -d '{
-    "llama_model_path": "/path/to/your_model.gguf",
-    "ctx_len": 2048,
+    "llama_model_path": "/model/llama-2-7b-model.gguf",
+    "ctx_len": 512,
     "ngl": 100,
-    "embedding": true,
-    "n_parallel": 4,
-    "pre_prompt": "A chat between a curious user and an artificial intelligence",
-    "user_prompt": "USER: ",
-    "ai_prompt": "ASSISTANT: "
+  }'
+```
+
+**Step 5: Making an Inference**
+
+```bash title="Nitro Inference"
+curl http://localhost:3928/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "messages": [
+      {
+        "role": "user",
+        "content": "Who won the world series in 2020?"
+      },
+    ]
   }'
 ```
 
@@ -89,7 +107,6 @@ Table of parameters
 | `system_prompt`    | String  | The prompt to use for system rules.                          |
 | `pre_prompt`    | String  | The prompt to use for internal configuration.                          |
 
-
 ***OPTIONAL***: You can run Nitro on a different port like 5000 instead of 3928 by running it manually in terminal
 ```zsh
 ./nitro 1 127.0.0.1 5000 ([thread_num] [host] [port])
@@ -98,32 +115,13 @@ Table of parameters
 - host : host value normally 127.0.0.1 or 0.0.0.0
 - port : the port that nitro got deployed onto
 
-**Step 4: Perform Inference on Nitro for the First Time**
-
-```zsh
-curl --location 'http://localhost:3928/inferences/llamacpp/chat_completion' \
-     --header 'Content-Type: application/json' \
-     --header 'Accept: text/event-stream' \
-     --header 'Access-Control-Allow-Origin: *' \
-     --data '{
-        "messages": [
-            {"content": "Hello there 👋", "role": "assistant"},
-            {"content": "Can you write a long story", "role": "user"}
-        ],
-        "stream": true,
-        "model": "gpt-3.5-turbo",
-        "max_tokens": 2000
-     }'
-```
-
 Nitro server is compatible with the OpenAI format, so you can expect the same output as the OpenAI ChatGPT API.
 
 ## Compile from source
-To compile nitro please visit [Compile from source](docs/manual_install.md)
+To compile nitro please visit [Compile from source](docs/new/build-source.md)
 
 ### Contact
 
 - For support, please file a GitHub ticket.
 - For questions, join our Discord [here](https://discord.gg/FTk2MvZwJH).
-- For long-form inquiries, please email [email protected].
-
+- For long-form inquiries, please email [email protected].
diff --git a/docs/docs/examples/chatbox.md b/docs/docs/examples/chatbox.md
@@ -2,10 +2,50 @@
 title: Nitro with Chatbox
 ---
 
-:::info COMING SOON
-:::
+This guide demonstrates how to integrate Nitro with Chatbox, showcasing the compatibility of Nitro with various platforms.
 
-<!-- 
 ## What is Chatbox?
+Chatbox is a versatile desktop client that supports multiple cutting-edge Large Language Models (LLMs). It is available for Windows, Mac, and Linux operating systems. 
 
-## How to use Nitro as backend -->
+For more information, please visit the [Chatbox official GitHub page](https://github.com/Bin-Huang/chatbox).
+
+
+## Downloading and Installing Chatbox
+
+To download and install Chatbox, follow the instructions available at this [link](https://github.com/Bin-Huang/chatbox#download).
+
+## Using Nitro as a Backend
+
+1. Start Nitro server
+
+Open your command line tool and enter:
+```
+nitro
+```
+
+> Ensure you are using the latest version of [Nitro](new/install.md)
+
+2. Run the Model
+
+To load the model, use the following command:
+
+```
+curl http://localhost:3928/inferences/llamacpp/loadmodel \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "llama_model_path": "model/llama-2-7b-chat.Q5_K_M.gguf",
+    "ctx_len": 512,
+    "ngl": 100,
+  }'
+```
+
+3. Config chatbox
+Adjust the `settings` in Chatbox to connect with Nitro. Change your settings to match the configuration shown in the image below:
+
+![Settings](img/chatbox.PNG)
+
+4. Chat with the Model
+
+Once the setup is complete, you can start chatting with the model using Chatbox. All functions of Chatbox are now enabled with Nitro as the backend.
+
+## Video demo
diff --git a/docs/docs/examples/img/chatbox.PNG b/docs/docs/examples/img/chatbox.PNG
diff --git a/docs/docs/new/about.md b/docs/docs/new/about.md
@@ -1,6 +1,6 @@
 ---
 title: About Nitro
-slug: /docs
+slug: /about
 ---
 
 Nitro is a high-efficiency C++ inference engine for edge computing, powering [Jan](https://jan.ai/). It is lightweight and embeddable, ideal for product integration.
@@ -119,7 +119,3 @@ Nitro welcomes contributions in various forms, not just coding. Here are some wa
 
 - [drogon](https://github.com/drogonframework/drogon): The fast C++ web framework
 - [llama.cpp](https://github.com/ggerganov/llama.cpp): Inference of LLaMA model in pure C/C++
-
-## FAQ
-:::info COMING SOON
-:::
diff --git a/docs/docs/new/architecture.md b/docs/docs/new/architecture.md
@@ -1,5 +1,6 @@
 ---
 title: Architecture
+slug: /achitecture
 ---
 
 ![Nitro Architecture](img/architecture.drawio.png)

diff --git a/docs/docs/new/build-source.md b/docs/docs/new/build-source.md
@@ -1,5 +1,6 @@
 ---
 title: Build From Source
+slug: /build-source
 ---
 
 This guide provides step-by-step instructions for building Nitro from source on Linux, macOS, and Windows systems.

diff --git a/docs/docs/new/faq.md b/docs/docs/new/faq.md
@@ -0,0 +1,20 @@
+---
+title: FAQs
+slug: /faq
+---
+
+### 1. Is Nitro the same as Llama.cpp with an API server?
+
+Yes, that's correct. However, Nitro isn't limited to just Llama.cpp; it will soon integrate multiple other models like Whisper, Bark, and Stable Diffusion, all in a single binary. This eliminates the need for you to develop a separate API server on top of AI models. Nitro is a comprehensive solution, designed for ease of use and efficiency.
+
+### 2. Is Nitro simply Llama-cpp-python?
+
+Indeed, Nitro isn't bound to Python, which allows you to leverage high-performance software that fully utilizes your system's capabilities. With Nitro, learning how to deploy a Python web server or use FastAPI isn't necessary. The Nitro web server is already fully optimized.
+
+### 3. Why should I switch to Nitro over Ollama? 
+
+While Ollama does provide similar functionalities, its design serves a different purpose. Ollama has a larger size (around 200MB) compared to Nitro's 3MB distribution. Nitro's compact size allows for easy embedding into subprocesses, ensuring minimal concerns about package size for your application. This makes Nitro a more suitable choice for applications where efficiency and minimal resource usage are key.
+
+### 4. Why is the model named "chat-gpt-3.5"?
+
+Many applications implement the OpenAI ChatGPT API, and we want Nitro to be versatile for any AI client. While you can use any model name, we've ensured that if you're already using the chatgpt API, switching to Nitro is seamless. Just replace api.openai.com with localhost:3928 in your client settings (like Chatbox, Sillytavern, Oobaboga, etc.), and it will work smoothly with Nitro.
diff --git a/docs/docs/new/model-cycle.md b/docs/docs/new/model-cycle.md
@@ -1,5 +1,6 @@
 ---
 title: Model Life Cycle
+slug: /model-cycle
 ---
 
 ## Load model

diff --git a/docs/docs/new/quickstart.md b/docs/docs/new/quickstart.md
@@ -1,5 +1,6 @@
 ---
 title: Quickstart
+slug: /quickstart
 ---
 
 ## Step 1: Install Nitro

diff --git a/docs/openapi/NitroAPI.yaml b/docs/openapi/NitroAPI.yaml
@@ -437,6 +437,10 @@ components:
               default: true
               nullable: true
               description: Determines if output generation is in a streaming manner.
+            cache_prompt:
+              type: boolean
+              default: true
+              description: Optimize performance in repeated or similar requests.
             temp:
               type: number
               default: 0.7
@@ -577,7 +581,10 @@ components:
           min: 0
           max: 1
           description: Set probability threshold for more relevant outputs
-
+        cache_prompt:
+          type: boolean
+          default: true
+          description: Optimize performance in repeated or similar requests.
     ChatCompletionResponse:
       type: object
       description: Description of the response structure