Skip to content

Commit

Permalink
Merge pull request #1586 from janhq/docs
Browse files Browse the repository at this point in the history
docs: Cortex.so page
  • Loading branch information
gabrielle-ong authored Oct 30, 2024
2 parents fcf2fb3 + 2c44b1e commit 041f6c6
Show file tree
Hide file tree
Showing 9 changed files with 174 additions and 221 deletions.
26 changes: 18 additions & 8 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,14 @@ body:
required: true
attributes:
label: "Cortex version"
description: "**Tip:** The version is in the app's bottom right corner"

description: "**Tip:** `cortex -v` outputs the version number"
- type: textarea
validations:
required: true
attributes:
label: "Describe the Bug"
description: "A clear & concise description of the bug"
label: "Describe the issue and expected behaviour"
description: "A clear & concise description of the issue encountered"

- type: textarea
attributes:
Expand All @@ -31,20 +31,30 @@ body:
attributes:
label: "Screenshots / Logs"
description: |
You can find logs in: ~/cortex/logs
Please include cortex-cli.log and cortex.log files in: ~/cortex/logs/
- type: checkboxes
attributes:
label: "What is your OS?"
options:
- label: MacOS
- label: Windows
- label: Linux
- label: Mac Silicon
- label: Mac Intel
- label: Linux / Ubuntu

- type: checkboxes
attributes:
label: "What engine are you running?"
options:
- label: cortex.llamacpp (default)
- label: cortex.tensorrt-llm (Nvidia GPUs)
- label: cortex.onnx (NPUs, DirectML)
- label: cortex.onnx (NPUs, DirectML)

- type: input
validations:
required: true
attributes:
label: "Hardware Specs eg OS version, GPU"
description:


10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Cortex is a Local AI API Platform that is used to run and customize LLMs.
Key Features:
- Straightforward CLI (inspired by Ollama)
- Full C++ implementation, packageable into Desktop and Mobile apps
- Pull from Huggingface of Cortex Built-in Model Library
- Pull from Huggingface, or Cortex Built-in Models
- Models stored in universal file formats (vs blobs)
- Swappable Engines (default: [`llamacpp`](https://github.com/janhq/cortex.llamacpp), future: [`ONNXRuntime`](https://github.com/janhq/cortex.onnx), [`TensorRT-LLM`](https://github.com/janhq/cortex.tensorrt-llm))
- Cortex can be deployed as a standalone API server, or integrated into apps like [Jan.ai](https://jan.ai/)
Expand Down Expand Up @@ -88,22 +88,22 @@ Refer to our [Quickstart](https://cortex.so/docs/quickstart/) and
### API:
Cortex.cpp includes a REST API accessible at `localhost:39281`.

Refer to our [API documentation](https://cortex.so/api-reference) for more details
Refer to our [API documentation](https://cortex.so/api-reference) for more details.

## Models & Quantizations
## Models

Cortex.cpp allows users to pull models from multiple Model Hubs, offering flexibility and extensive model access.

Currently Cortex supports pulling from:
- Hugging Face: GGUF models eg `author/Model-GGUF`
- [Hugging Face](https://huggingface.co): GGUF models eg `author/Model-GGUF`
- Cortex Built-in Models

Once downloaded, the model `.gguf` and `model.yml` files are stored in `~\cortexcpp\models`.

> **Note**:
> You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 14B models, and 32 GB to run the 32B models.
### Cortex Model Hub & Quantizations
### Cortex Built-in Models & Quantizations

| Model /Engine | llama.cpp | Command |
| -------------- | --------------------- | ----------------------------- |
Expand Down
4 changes: 0 additions & 4 deletions docs/docs/architecture.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,3 @@ Our development roadmap outlines key features and epics we will focus on in the

- **RAG**: Improve response quality and contextual relevance in our AI models.
- **Cortex Python Runtime**: Provide a scalable Python execution environment for Cortex.

:::info
For a full list of Cortex development roadmap, please see [here](https://discord.com/channels/1107178041848909847/1230770299730001941).
:::
104 changes: 67 additions & 37 deletions docs/docs/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,39 +10,82 @@ import TabItem from "@theme/TabItem";

# Cortex

:::warning
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
:::info
**Real-world Use**: Cortex.cpp powers [Jan](https://jan.ai), our on-device ChatGPT-alternative.

Cortex.cpp is in active development. If you have any questions, please reach out to us on [GitHub](https://github.com/janhq/cortex.cpp/issues/new/choose)
or [Discord](https://discord.com/invite/FTk2MvZwJH)
:::

![Cortex Cover Image](/img/social-card.jpg)

Cortex.cpp lets you run AI easily on your computer.

Cortex.cpp is a C++ command-line interface (CLI) designed as an alternative to Ollama. By default, it runs on the `llama.cpp` engine but also supports other engines, including `ONNX` and `TensorRT-LLM`, making it a multi-engine platform.
Cortex is a Local AI API Platform that is used to run and customize LLMs.

## Supported Accelerators
- Nvidia CUDA
- Apple Metal
- Qualcomm AI Engine
Key Features:
- Straightforward CLI (inspired by Ollama)
- Full C++ implementation, packageable into Desktop and Mobile apps
- Pull from Huggingface, or Cortex Built-in Model Library
- Models stored in universal file formats (vs blobs)
- Swappable Inference Backends (default: [`llamacpp`](https://github.com/janhq/cortex.llamacpp), future: [`ONNXRuntime`](https://github.com/janhq/cortex.onnx), [`TensorRT-LLM`](https://github.com/janhq/cortex.tensorrt-llm))
- Cortex can be deployed as a standalone API server, or integrated into apps like [Jan.ai](https://jan.ai/)

## Supported Inference Backends
- [llama.cpp](https://github.com/ggerganov/llama.cpp): cross-platform, supports most laptops, desktops and OSes
- [ONNX Runtime](https://github.com/microsoft/onnxruntime): supports Windows Copilot+ PCs & NPUs
- [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM): supports Nvidia GPUs

If GPU hardware is available, Cortex is GPU accelerated by default.

:::info
**Real-world Use**: Cortex.cpp powers [Jan](https://jan.ai), our on-device ChatGPT-alternative.
Cortex's roadmap is to implement the full OpenAI API including Tools, Runs, Multi-modal and Realtime APIs.

Cortex.cpp has been battle-tested across 1 million+ downloads and handles a variety of hardware configurations.
:::

## Supported Models
## Inference Backends
- Default: [llama.cpp](https://github.com/ggerganov/llama.cpp): cross-platform, supports most laptops, desktops and OSes
- Future: [ONNX Runtime](https://github.com/microsoft/onnxruntime): supports Windows Copilot+ PCs & NPUs
- Future: [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM): supports Nvidia GPUs

Cortex.cpp supports the following list of [Built-in Models](/models):
If GPU hardware is available, Cortex is GPU accelerated by default.

<Tabs>
## Models
Cortex.cpp allows users to pull models from multiple Model Hubs, offering flexibility and extensive model access.
- [Hugging Face](https://huggingface.co)
- [Cortex Built-in Models](https://cortex.so/models)

> **Note**:
> As a very general guide: You should have >8 GB of RAM available to run the 7B models, 16 GB to run the 14B models, and 32 GB to run the 32B models.
### Cortex Built-in Models & Quantizations
| Model /Engine | llama.cpp | Command |
| -------------- | --------------------- | ----------------------------- |
| phi-3.5 || cortex run phi3.5 |
| llama3.2 || cortex run llama3.1 |
| llama3.1 || cortex run llama3.1 |
| codestral || cortex run codestral |
| gemma2 || cortex run gemma2 |
| mistral || cortex run mistral |
| ministral || cortex run ministral |
| qwen2 || cortex run qwen2.5 |
| openhermes-2.5 || cortex run openhermes-2.5 |
| tinyllama || cortex run tinyllama |

View all [Cortex Built-in Models](https://cortex.so/models).

Cortex supports multiple quantizations for each model.
```
❯ cortex-nightly pull llama3.2
Downloaded models:
llama3.2:3b-gguf-q2-k
Available to download:
1. llama3.2:3b-gguf-q3-kl
2. llama3.2:3b-gguf-q3-km
3. llama3.2:3b-gguf-q3-ks
4. llama3.2:3b-gguf-q4-km (default)
5. llama3.2:3b-gguf-q4-ks
6. llama3.2:3b-gguf-q5-km
7. llama3.2:3b-gguf-q5-ks
8. llama3.2:3b-gguf-q6-k
9. llama3.2:3b-gguf-q8-0
Select a model (1-9):
```


{/*
<Tabs>
<TabItem value="Llama.cpp" label="Llama.cpp" default>
| Model ID | Variant (Branch) | Model size | CLI command |
|------------------|------------------|-------------------|------------------------------------|
Expand Down Expand Up @@ -86,17 +129,4 @@ Cortex.cpp supports the following list of [Built-in Models](/models):
| openhermes-2.5 | 7b-tensorrt-llm-linux-ada | 7B | `cortex run openhermes-2.5:7b-tensorrt-llm-linux-ada`|
</TabItem>
</Tabs>
:::info
Cortex.cpp supports pulling `GGUF` and `ONNX` models from the [Hugging Face Hub](https://huggingface.co). Read how to [Pull models from Hugging Face](/docs/hub/hugging-face/)
:::

## Cortex.cpp Versions
Cortex.cpp offers three different versions of the app, each serving a unique purpose:
- **Stable**: The official release version of Cortex.cpp, designed for general use with proven stability.
- **Beta**: This version includes upcoming features still in testing, allowing users to try new functionality before the next official release.
- **Nightly**: Automatically built every night, this version includes the latest updates and changes from the engineering team but may be unstable.

:::info
Each of these versions has a different CLI prefix command.
:::
</Tabs> */}
Loading

0 comments on commit 041f6c6

Please sign in to comment.