-
Notifications
You must be signed in to change notification settings - Fork 134
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #174 from janhq/tidy-Nitro
Tidy Nitro docs
- Loading branch information
Showing
12 changed files
with
119 additions
and
9,925 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,11 +17,9 @@ | |
- Quick Setup: Approximately 10-second initialization for swift deployment. | ||
- Enhanced Web Framework: Incorporates drogon cpp to boost web service efficiency. | ||
|
||
## Documentation | ||
|
||
## About Nitro | ||
|
||
Nitro is a light-weight integration layer (and soon to be inference engine) for cutting edge inference engine, make deployment of AI models easier than ever before! | ||
Nitro is a high-efficiency C++ inference engine for edge computing, powering [Jan](https://jan.ai/). It is lightweight and embeddable, ideal for product integration. | ||
|
||
The binary of nitro after zipped is only ~3mb in size with none to minimal dependencies (if you use a GPU need CUDA for example) make it desirable for any edge/server deployment 👍. | ||
|
||
|
@@ -40,37 +38,57 @@ The binary of nitro after zipped is only ~3mb in size with none to minimal depen | |
|
||
## Quickstart | ||
|
||
**Step 1: Download Nitro** | ||
**Step 1: Install Nitro** | ||
|
||
To use Nitro, download the released binaries from the release page below: | ||
- For Linux and MacOS | ||
|
||
[![Download Nitro](https://img.shields.io/badge/Download-Nitro-blue.svg)](https://github.com/janhq/nitro/releases) | ||
```bash | ||
curl -sfL https://raw.githubusercontent.com/janhq/nitro/main/install.sh | sudo /bin/bash - | ||
``` | ||
|
||
After downloading the release, double-click on the Nitro binary. | ||
- For Windows | ||
|
||
**Step 2: Download a Model** | ||
```bash | ||
powershell -Command "& { Invoke-WebRequest -Uri 'https://raw.githubusercontent.com/janhq/nitro/main/install.bat' -OutFile 'install.bat'; .\install.bat; Remove-Item -Path 'install.bat' }" | ||
``` | ||
|
||
Download a llama model to try running the llama C++ integration. You can find a "GGUF" model on The Bloke's page below: | ||
**Step 2: Downloading a Model** | ||
|
||
[![Download Model](https://img.shields.io/badge/Download-Model-green.svg)](https://huggingface.co/TheBloke) | ||
```bash | ||
mkdir model && cd model | ||
wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf?download=true | ||
``` | ||
|
||
**Step 3: Run Nitro** | ||
**Step 3: Run Nitro server** | ||
|
||
Double-click on Nitro to run it. After downloading your model, make sure it's saved to a specific path. Then, make an API call to load your model into Nitro. | ||
```bash title="Run Nitro server" | ||
nitro | ||
``` | ||
|
||
**Step 4: Load model** | ||
|
||
```zsh | ||
curl -X POST 'http://localhost:3928/inferences/llamacpp/loadmodel' \ | ||
```bash title="Load model" | ||
curl http://localhost:3928/inferences/llamacpp/loadmodel \ | ||
-H 'Content-Type: application/json' \ | ||
-d '{ | ||
"llama_model_path": "/path/to/your_model.gguf", | ||
"ctx_len": 2048, | ||
"llama_model_path": "/model/llama-2-7b-model.gguf", | ||
"ctx_len": 512, | ||
"ngl": 100, | ||
"embedding": true, | ||
"n_parallel": 4, | ||
"pre_prompt": "A chat between a curious user and an artificial intelligence", | ||
"user_prompt": "USER: ", | ||
"ai_prompt": "ASSISTANT: " | ||
}' | ||
``` | ||
|
||
**Step 5: Making an Inference** | ||
|
||
```bash title="Nitro Inference" | ||
curl http://localhost:3928/v1/chat/completions \ | ||
-H "Content-Type: application/json" \ | ||
-d '{ | ||
"messages": [ | ||
{ | ||
"role": "user", | ||
"content": "Who won the world series in 2020?" | ||
}, | ||
] | ||
}' | ||
``` | ||
|
||
|
@@ -89,7 +107,6 @@ Table of parameters | |
| `system_prompt` | String | The prompt to use for system rules. | | ||
| `pre_prompt` | String | The prompt to use for internal configuration. | | ||
|
||
|
||
***OPTIONAL***: You can run Nitro on a different port like 5000 instead of 3928 by running it manually in terminal | ||
```zsh | ||
./nitro 1 127.0.0.1 5000 ([thread_num] [host] [port]) | ||
|
@@ -98,32 +115,13 @@ Table of parameters | |
- host : host value normally 127.0.0.1 or 0.0.0.0 | ||
- port : the port that nitro got deployed onto | ||
|
||
**Step 4: Perform Inference on Nitro for the First Time** | ||
|
||
```zsh | ||
curl --location 'http://localhost:3928/inferences/llamacpp/chat_completion' \ | ||
--header 'Content-Type: application/json' \ | ||
--header 'Accept: text/event-stream' \ | ||
--header 'Access-Control-Allow-Origin: *' \ | ||
--data '{ | ||
"messages": [ | ||
{"content": "Hello there 👋", "role": "assistant"}, | ||
{"content": "Can you write a long story", "role": "user"} | ||
], | ||
"stream": true, | ||
"model": "gpt-3.5-turbo", | ||
"max_tokens": 2000 | ||
}' | ||
``` | ||
|
||
Nitro server is compatible with the OpenAI format, so you can expect the same output as the OpenAI ChatGPT API. | ||
|
||
## Compile from source | ||
To compile nitro please visit [Compile from source](docs/manual_install.md) | ||
To compile nitro please visit [Compile from source](docs/new/build-source.md) | ||
|
||
### Contact | ||
|
||
- For support, please file a GitHub ticket. | ||
- For questions, join our Discord [here](https://discord.gg/FTk2MvZwJH). | ||
- For long-form inquiries, please email [email protected]. | ||
|
||
- For long-form inquiries, please email [email protected]. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
--- | ||
title: Architecture | ||
slug: /achitecture | ||
--- | ||
|
||
![Nitro Architecture](img/architecture.drawio.png) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
--- | ||
title: FAQs | ||
slug: /faq | ||
--- | ||
|
||
### 1. Is Nitro the same as Llama.cpp with an API server? | ||
|
||
Yes, that's correct. However, Nitro isn't limited to just Llama.cpp; it will soon integrate multiple other models like Whisper, Bark, and Stable Diffusion, all in a single binary. This eliminates the need for you to develop a separate API server on top of AI models. Nitro is a comprehensive solution, designed for ease of use and efficiency. | ||
|
||
### 2. Is Nitro simply Llama-cpp-python? | ||
|
||
Indeed, Nitro isn't bound to Python, which allows you to leverage high-performance software that fully utilizes your system's capabilities. With Nitro, learning how to deploy a Python web server or use FastAPI isn't necessary. The Nitro web server is already fully optimized. | ||
|
||
### 3. Why should I switch to Nitro over Ollama? | ||
|
||
While Ollama does provide similar functionalities, its design serves a different purpose. Ollama has a larger size (around 200MB) compared to Nitro's 3MB distribution. Nitro's compact size allows for easy embedding into subprocesses, ensuring minimal concerns about package size for your application. This makes Nitro a more suitable choice for applications where efficiency and minimal resource usage are key. | ||
|
||
### 4. Why is the model named "chat-gpt-3.5"? | ||
|
||
Many applications implement the OpenAI ChatGPT API, and we want Nitro to be versatile for any AI client. While you can use any model name, we've ensured that if you're already using the chatgpt API, switching to Nitro is seamless. Just replace api.openai.com with localhost:3928 in your client settings (like Chatbox, Sillytavern, Oobaboga, etc.), and it will work smoothly with Nitro. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
--- | ||
title: Model Life Cycle | ||
slug: /model-cycle | ||
--- | ||
|
||
## Load model | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
--- | ||
title: Quickstart | ||
slug: /quickstart | ||
--- | ||
|
||
## Step 1: Install Nitro | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.