Skip to content

Commit

Permalink
Merge pull request #194 from janhq/docs-fix-27-11
Browse files Browse the repository at this point in the history
Minor fix on Nitro Docs
  • Loading branch information
tikikun authored Nov 28, 2023
2 parents 0544364 + 70bde1a commit 03e8913
Show file tree
Hide file tree
Showing 22 changed files with 165 additions and 220 deletions.
24 changes: 0 additions & 24 deletions docs/docs/demos/chatbox-vid.mdx

This file was deleted.

63 changes: 0 additions & 63 deletions docs/docs/examples/chatbox.md

This file was deleted.

1 change: 1 addition & 0 deletions docs/docs/examples/jan.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Nitro with Jan
description: Nitro integrates with Jan to enable a ChatGPT-like functional app, optimized for local AI.
---

You can effortlessly utilize Nitro through [Jan](https://jan.ai/), as it is fully integrated with all its functions. With Jan, using Nitro becomes straightforward without the need for any coding.
Expand Down
29 changes: 19 additions & 10 deletions docs/docs/examples/openai-node.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
---
title: Nitro with openai-node
description: Nitro intergration guide for Node.js.
---

You can migrate from OAI API or Azure OpenAI to Nitro using your existing NodeJS code quickly
> The ONLY thing you need to do is to override `baseURL` in `openai` init with `Nitro` URL
> The **ONLY** thing you need to do is to override `baseURL` in `openai` init with `Nitro` URL
- NodeJS OpenAI SDK: https://www.npmjs.com/package/openai

## Chat Completion
Expand Down Expand Up @@ -240,17 +241,23 @@ embedding();
</table>

## Audio
Coming soon

:::info Coming soon
:::

## How to reproduce
1. Step 1: Dependencies installation
```

**Step 1:** Dependencies installation

```bash
npm install --save openai typescript
# or
yarn add openai
```
2. Step 2: Fill `tsconfig.json`
```json

**Step 2:** Fill `tsconfig.json`

```js
{
"compilerOptions": {
"moduleResolution": "node",
Expand All @@ -263,7 +270,9 @@ yarn add openai
"lib": ["es2015"]
}
```
3. Step 3: Fill `index.ts` file with code
3. Step 4: Build with `npx tsc`
4. Step 5: Run the code with `node dist/index.js`
5. Step 6: Enjoy!

**Step 3:** Fill `index.ts` file with code.

**Step 4:** Build with `npx tsc`.

**Step 5:** Run the code with `node dist/index.js`.
47 changes: 28 additions & 19 deletions docs/docs/examples/openai-python.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
---
title: Nitro with openai-python
description: Nitro intergration guide for Python.
---


You can migrate from OAI API or Azure OpenAI to Nitro using your existing Python code quickly
> The ONLY thing you need to do is to override `baseURL` in `openai` init with `Nitro` URL
> The **ONLY** thing you need to do is to override `baseURL` in `openai` init with `Nitro` URL
- Python OpenAI SDK: https://pypi.org/project/openai/

## Chat Completion
Expand All @@ -22,7 +23,10 @@ import asyncio
from openai import AsyncOpenAI

# gets API Key from environment variable OPENAI_API_KEY
client = AsyncOpenAI(base_url="http://localhost:3928/v1/", api_key="sk-xxx")
client = AsyncOpenAI(
base_url="http://localhost:3928/v1/",
api_key="sk-xxx"
)


async def main() -> None:
Expand Down Expand Up @@ -74,22 +78,16 @@ asyncio.run(main())
```python
from openai import AzureOpenAI

openai.api_key = '...' # Default is environment variable AZURE_OPENAI_API_KEY
openai.api_key = '...' # Default is AZURE_OPENAI_API_KEY

stream = AzureOpenAI(
api_version=api_version,
# https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/create-resource?pivots=web-portal#create-a-resource
azure_endpoint="https://example-endpoint.openai.azure.com",
)

completion = client.chat.completions.create(
model="deployment-name", # e.g. gpt-35-instant
messages=[
{
"role": "user",
"content": "How do I output all files in a directory using Python?",
},
],
messages=[{"role": "user", "content": "Say this is a test"}],
stream=True,
)
for part in stream:
Expand All @@ -115,11 +113,15 @@ import asyncio
from openai import AsyncOpenAI

# gets API Key from environment variable OPENAI_API_KEY
client = AsyncOpenAI(base_url="http://localhost:3928/v1/", api_key="sk-xxx")
client = AsyncOpenAI(base_url="http://localhost:3928/v1/",
api_key="sk-xxx")


async def main() -> None:
embedding = await client.embeddings.create(input='Hello How are you?', model='text-embedding-ada-002')
embedding = await client.embeddings.create(
input='Hello How are you?',
model='text-embedding-ada-002'
)
print(embedding)

asyncio.run(main())
Expand All @@ -140,7 +142,10 @@ client = AsyncOpenAI(api_key="sk-xxx")


async def main() -> None:
embedding = await client.embeddings.create(input='Hello How are you?', model='text-embedding-ada-002')
embedding = await client.embeddings.create(
input='Hello How are you?',
model='text-embedding-ada-002'
)
print(embedding)

asyncio.run(main())
Expand Down Expand Up @@ -173,13 +178,17 @@ print(embeddings)
</table>

## Audio
Coming soon

:::info Coming soon
:::

## How to reproduce
1. Step 1: Dependencies installation
```
**Step 1:** Dependencies installation.

```bash title="Install OpenAI"
pip install openai
```
3. Step 2: Fill `index.py` file with code
4. Step 3: Run the code with `python index.py`
5. Step 5: Enjoy!

**Step 2:** Fill `index.py` file with code.

**Step 3:** Run the code with `python index.py`.
15 changes: 9 additions & 6 deletions docs/docs/examples/palchat.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Nitro with Pal Chat
description: Nitro intergration guide for mobile device usage.
---

This guide demonstrates how to use Nitro with Pal Chat, enabling local AI chat capabilities on mobile devices.
Expand All @@ -15,15 +16,15 @@ Pal is a mobile app available on the App Store. It offers a customizable chat pl
**1. Start Nitro server**

Open your terminal:
```
```bash title="Run Nitro"
nitro
```

**2. Download Model**

Use these commands to download and save the [Llama2 7B chat model](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/tree/main):

```bash
```bash title="Get a model"
mkdir model && cd model
wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf?download=true
```
Expand All @@ -34,7 +35,7 @@ wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GG

To load the model, use the following command:

```
```bash title="Load model to the server"
curl http://localhost:3928/inferences/llamacpp/loadmodel \
-H 'Content-Type: application/json' \
-d '{
Expand All @@ -44,11 +45,13 @@ curl http://localhost:3928/inferences/llamacpp/loadmodel \
}'
```

**4. Config Pal Chat**
**4. Configure Pal Chat**

In the `OpenAI API Key` field, just type any random text (e.g. key-xxxxxx).

Adjust the `provide custom host` setting under `advanced settings` in Pal Chat to connect with Nitro. Enter your LAN IPv4 address (It should be something like 192.xxx.x.xxx).
Adjust the `provide custom host` setting under `advanced settings` in Pal Chat with your LAN IPv4 address (a series of numbers like 192.xxx.x.xxx).

> For instruction read: [How to find your IP](https://support.microsoft.com/en-us/windows/find-your-ip-address-in-windows-f21a9bbc-c582-55cd-35e0-73431160a1b9)
> For instruction: [How to find your IP](https://support.microsoft.com/en-us/windows/find-your-ip-address-in-windows-f21a9bbc-c582-55cd-35e0-73431160a1b9)
![PalChat](img/pal.png)

Expand Down
3 changes: 2 additions & 1 deletion docs/docs/features/chat.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
---
title: Chat Completion
description: Inference engine for chat completion, the same as OpenAI's
---

The Chat Completion feature in Nitro provides a flexible way to interact with any local Large Language Model (LLM).

## Single Request Example
### Single Request Example

To send a single query to your chosen LLM, follow these steps:

Expand Down
23 changes: 9 additions & 14 deletions docs/docs/features/cont-batch.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,19 @@
---
title: Continuous Batching
description: Nitro's continuous batching combines multiple requests, enhancing throughput.
---

## What is continous batching?
Continuous batching boosts throughput and minimizes latency in large language model (LLM) inference. This technique groups multiple inference requests, significantly improving GPU utilization.

Continuous batching is a powerful technique that significantly boosts throughput in large language model (LLM) inference while minimizing latency. This process dynamically groups multiple inference requests, allowing for more efficient GPU utilization.
**Key Advantages:**

## Why Continuous Batching?
- Increased Throughput.
- Reduced Latency.
- Efficient GPU Use.

Traditional static batching methods can lead to underutilization of GPU resources, as they wait for all sequences in a batch to complete before moving on. Continuous batching overcomes this by allowing new sequences to start processing as soon as others finish, ensuring more consistent and efficient GPU usage.
**Implementation Insight:**

## Benefits of Continuous Batching

- **Increased Throughput:** Improvement over traditional batching methods.
- **Reduced Latency:** Lower p50 latency, leading to faster response times.
- **Efficient Resource Utilization:** Maximizes GPU memory and computational capabilities.
To evaluate its effectiveness, compare continuous batching with traditional methods. For more details on benchmarking, refer to this [article](https://www.anyscale.com/blog/continuous-batching-llm-inference).

## How to use continous batching
Nitro's `continuous batching` feature allows you to combine multiple requests for the same model execution, enhancing throughput and efficiency.
Expand All @@ -30,8 +29,4 @@ curl http://localhost:3928/inferences/llamacpp/loadmodel \
}'
```

For optimal performance, ensure that the `n_parallel` value is set to match the `thread_num`, as detailed in the [Multithreading](features/multi-thread.md) documentation.

### Benchmark and Compare

To understand the impact of continuous batching on your system, perform benchmarks comparing it with traditional batching methods. This [article](https://www.anyscale.com/blog/continuous-batching-llm-inference) will help you quantify improvements in throughput and latency.
For optimal performance, ensure that the `n_parallel` value is set to match the `thread_num`, as detailed in the [Multithreading](features/multi-thread.md) documentation.
Loading

0 comments on commit 03e8913

Please sign in to comment.