-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* feat: update README.md * fix: remove cortex-cpp --------- Co-authored-by: vansangpfiev <[email protected]>
- Loading branch information
1 parent
c90a3a9
commit 0f7b3f0
Showing
1 changed file
with
102 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,102 @@ | ||
# cortex.onnx | ||
# cortex.onnx | ||
cortex.onnx is a high-efficiency C++ inference engine for edge computing focusing on Windows platform using DirectML for GPU acceleration. | ||
|
||
It is a dynamic library that can be loaded by any server at runtime. | ||
|
||
# Repo Structure | ||
``` | ||
. | ||
├── base -> Engine interface | ||
├── examples -> Server example to integrate engine | ||
├── onnxruntime-genai -> Upstream onnxruntime-genai | ||
├── src -> Engine implementation | ||
├── third-party -> Dependencies of the cortex.onnx project | ||
``` | ||
|
||
## Build from source | ||
|
||
This guide provides step-by-step instructions for building cortex.onnx from source on Windows systems. | ||
|
||
## Clone the Repository | ||
|
||
First, you need to clone the cortex.onnx repository: | ||
|
||
```bash | ||
git clone --recurse https://github.com/janhq/cortex.onnx.git | ||
``` | ||
|
||
If you don't have git, you can download the source code as a file archive from [cortex.onnx GitHub](https://github.com/janhq/cortex.onnx). | ||
|
||
## Build library with server example | ||
- **On Windows** | ||
Install CMake and MsBuild | ||
``` | ||
# Build dependencies | ||
./build_cortex_onnx.bat | ||
# Build engine | ||
mkdir build | ||
cd build | ||
cmake .. | ||
cmake --build . --config Release -j4 | ||
# Build server example (from root repository) | ||
mkdir -p examples/server/build | ||
cd examples/server/build | ||
cmake .. | ||
cmake --build . --config Release -j4 | ||
``` | ||
|
||
# Quickstart | ||
**Step 1: Downloading a Model** | ||
|
||
Clone a model from https://huggingface.co/cortexhub, checkout to dml branch | ||
|
||
**Step 2: Start server** | ||
- **On Windows** | ||
|
||
```bash | ||
cd examples/server/build/Release | ||
mkdir -p engines\cortex.onnx\ | ||
cp ..\..\..\..\build\Release\engine.dll engines\cortex.onnx\ | ||
cp ..\..\..\..\onnxruntime-genai\build\Release\*.dll .\ | ||
server.exe | ||
``` | ||
|
||
**Step 3: Load model** | ||
```bash title="Load model" | ||
curl http://localhost:3928/loadmodel \ | ||
-H 'Content-Type: application/json' \ | ||
-d '{ | ||
"model_path": "./model/llama3", | ||
"model_alias": "llama3", | ||
"system_prompt": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n", | ||
"user_prompt": "<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n", | ||
"ai_prompt": "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" | ||
}' | ||
``` | ||
**Step 4: Making an Inference** | ||
|
||
```bash title="cortex.onnx Inference" | ||
curl http://localhost:3928/v1/chat/completions \ | ||
-H "Content-Type: application/json" \ | ||
-d '{ | ||
"messages": [ | ||
{ | ||
"role": "system", | ||
"content": "You are a helpful assistant." | ||
}, | ||
{ | ||
"role": "user", | ||
"content": "Who won the world series in 2020?" | ||
} | ||
], | ||
"model": "llama3" | ||
}' | ||
``` | ||
|
||
Table of parameters | ||
|
||
| Parameter | Type | Description | | ||
|------------------|---------|--------------------------------------------------------------| | ||
| `model_path` | String | The file path to the onnx model. | |