diff --git a/README.md b/README.md index 75c367e76..eed1faaf6 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@
- Getting Started - Docs + Getting Started - Docs - Changelog - Bug reports - Discord
@@ -67,11 +67,27 @@ curl -X POST 'http://localhost:3928/inferences/llamacpp/loadmodel' \ "llama_model_path": "/path/to/your_model.gguf", "ctx_len": 2048, "ngl": 100, - "embedding": true + "embedding": true, + "n_parallel": 4, + "pre_prompt": "A chat between a curious user and an artificial intelligence", + "user_prompt": "what is AI?" }' ``` -`ctx_len` and `ngl` are typical llama C++ parameters, and `embedding` determines whether to enable the embedding endpoint or not. +Table of parameters + +| Parameter | Type | Description | +|------------------|---------|--------------------------------------------------------------| +| `llama_model_path` | String | The file path to the LLaMA model. | +| `ngl` | Integer | The number of GPU layers to use. | +| `ctx_len` | Integer | The context length for the model operations. | +| `embedding` | Boolean | Whether to use embedding in the model. | +| `n_parallel` | Integer | The number of parallel operations. Uses Drogon thread count if not set. | +| `cont_batching` | Boolean | Whether to use continuous batching. | +| `user_prompt` | String | The prompt to use for the user. | +| `ai_prompt` | String | The prompt to use for the AI assistant. | +| `system_prompt` | String | The prompt to use for system rules. | +| `pre_prompt` | String | The prompt to use for internal configuration. | **Step 4: Perform Inference on Nitro for the First Time**