Skip to content

Commit

Permalink
Update Engine interface (#1759)
Browse files Browse the repository at this point in the history
* chore: add document

* feat: update engine interface
  • Loading branch information
namchuai authored Dec 10, 2024
1 parent 0fa83b2 commit 43e740d
Show file tree
Hide file tree
Showing 10 changed files with 341 additions and 218 deletions.
235 changes: 178 additions & 57 deletions docs/docs/engines/engine-extension.mdx
Original file line number Diff line number Diff line change
@@ -1,89 +1,210 @@
---
title: Building Engine Extensions
title: Adding a Third-Party Engine to Cortex
description: Cortex supports Engine Extensions to integrate both :ocal inference engines, and Remote APIs.
---

:::info
🚧 Cortex is currently under development, and this page is a stub for future development.
:::

<!--
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";

:::warning
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
:::

# Guide to Adding a Third-Party Engine to Cortex

## Introduction

This guide outlines the steps to integrate a custom engine with Cortex. We hope this helps developers understand the integration process.

## Implementation Steps

### 1. Implement the Engine Interface

First, create an engine that implements the `EngineI.h` interface. Here's the interface definition:

```cpp
class EngineI {
public:
struct RegisterLibraryOption {
std::vector<std::filesystem::path> paths;
};

struct EngineLoadOption {
// engine
std::filesystem::path engine_path;
std::filesystem::path cuda_path;
bool custom_engine_path;

// logging
std::filesystem::path log_path;
int max_log_lines;
trantor::Logger::LogLevel log_level;
};

struct EngineUnloadOption {
bool unload_dll;
};

virtual ~EngineI() {}

This document provides a step-by-step guide to adding a new engine to the Cortex codebase, similar to the `OpenAIEngineExtension`.
virtual void RegisterLibraryPath(RegisterLibraryOption opts) = 0;

virtual void Load(EngineLoadOption opts) = 0;

## Integrate a New Remote Engine
virtual void Unload(EngineUnloadOption opts) = 0;

### Step 1: Create the New Engine Extension
// Cortex.llamacpp interface methods
virtual void HandleChatCompletion(
std::shared_ptr<Json::Value> json_body,
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;

1. Navigate to the `cortex-js/src/extensions` directory.
2. Create a new file named `<new-engine>.engine.ts` (replace `<new-engine>` with the name of your engine).
3. Implement your new engine extension class using the following template:
virtual void HandleEmbedding(
std::shared_ptr<Json::Value> json_body,
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;

```typescript
class <NewEngine>EngineExtension extends OAIEngineExtension {
apiUrl = 'https://api.<new-engine>.com/v1/chat/completions';
name = '<new-engine>';
productName = '<New Engine> Inference Engine';
description = 'This extension enables <New Engine> chat completion API calls';
version = '0.0.1';
apiKey?: string;
}
virtual void LoadModel(
std::shared_ptr<Json::Value> json_body,
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;

virtual void UnloadModel(
std::shared_ptr<Json::Value> json_body,
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;

virtual void GetModelStatus(
std::shared_ptr<Json::Value> json_body,
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;

// Compatibility and model management
virtual bool IsSupported(const std::string& f) = 0;

virtual void GetModels(
std::shared_ptr<Json::Value> jsonBody,
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;

// Logging configuration
virtual bool SetFileLogger(int max_log_lines,
const std::string& log_path) = 0;
virtual void SetLogLevel(trantor::Logger::LogLevel logLevel) = 0;
};
```
:::info
Be sure to replace all placeholders with the appropriate values for your engine.
:::
#### Lifecycle Management
##### RegisterLibraryPath
```cpp
virtual void RegisterLibraryPath(RegisterLibraryOption opts) = 0;
```
This method is called during engine initialization to set up dynamic library search paths. For example, in Linux, we still have to use `LD_LIBRARY_PATH` to add CUDA dependencies to the search path.
**Parameters:**
- `opts.paths`: Vector of filesystem paths that the engine should register
### Step 2: Register the New Engine
**Implementation Requirements:**
1. Open the `extensions.module.ts` located at `cortex-js/src/extensions/`.
- Register provided paths for dynamic library loading
- Handle invalid paths gracefully
- Thread-safe implementation
- No exceptions should escape the method
2. Register your new engine in the provider array using the following code:
##### Load
```typescript
[
new OpenAIEngineExtension(httpService, configUsecases, eventEmitter),
//... other remote engines
new <NewEngine>EngineExtension(httpService, configUsecases, eventEmitter),
]
```cpp
virtual void Load(EngineLoadOption opts) = 0;
```
## Explanation of Key Properties and Methods
| **Value** | **Description** |
|------------------------------------|--------------------------------------------------------------------------------------------------|
| `apiUrl` | This is the URL endpoint for the new engine's API. It is used to make chat completion requests. |
| `name` | This is a unique identifier for the engine. It is used internally to reference the engine. |
| `productName` | This is a human-readable name for the engine. It is used for display purposes. |
| `description` | This provides a brief description of what the engine does. It is used for documentation and display purposes. |
| `version` | This indicates the version of the engine extension. It is used for version control and display purposes. |
| `eventEmmitter.on('config.updated')` | This is an event listener that listens for configuration updates. When the configuration for the engine is updated, this listener updates the `apiKey` and the engine's status. |
| `onLoad` | This method is called when the engine extension is loaded. It retrieves the engine's configuration (such as the `apiKey`) and sets the engine's status based on whether the `apiKey` is available. |
Initializes the engine with the provided configuration options.
## Advanced: Transforming Payloads and Responses
**Parameters:**
Some engines require custom transformations for the payload sent to the API and the response received from the API. This is achieved using the `transformPayload` and `transformResponse` methods. These methods allow you to modify the data structure to match the specific requirements of the engine.
- `engine_path`: Base path for engine files
- `cuda_path`: Path to CUDA installation
- `custom_engine_path`: Flag for using custom engine location
- `log_path`: Location for log files
- `max_log_lines`: Maximum number of lines per log file
- `log_level`: Logging verbosity level
### `transformPayload`
**Implementation Requirements:**
- Validate all paths before use
- Initialize engine components
- Set up logging configuration
- Handle missing dependencies gracefully
- Clean initialization state in case of failures
##### Unload
```cpp
virtual void Unload(EngineUnloadOption opts) = 0;
```
Performs cleanup and shutdown of the engine.
**Parameters:**
- `unload_dll`: Boolean flag indicating whether to unload dynamic libraries
**Implementation Requirements:**
- Clean up all allocated resources
- Close file handles and connections
- Release memory
- Ensure proper shutdown of running models
- Handle cleanup in a thread-safe manner
### 2. Create a Dynamic Library
We recommend using the [dylib library](https://github.com/martin-olivier/dylib) to build your dynamic library. This library provides helpful tools for creating cross-platform dynamic libraries.
### 3. Package Dependencies
Please ensure all dependencies are included with your dynamic library. This allows us to create a single, self-contained package for distribution.
### 4. Publication and Integration
#### 4.1 Publishing Your Engine (Optional)
If you wish to make your engine publicly available, you can publish it through GitHub. For reference, examine the [cortex.llamacpp releases](https://github.com/janhq/cortex.llamacpp/releases) structure:
- Each release tag should represent your version
- Include all variants within the same release
- Cortex will automatically select the most suitable variant or allow users to specify their preferred variant
#### 4.2 Integration with Cortex
Once your engine is ready, we encourage you to:
1. Notify the Cortex team about your engine for potential inclusion in our default supported engines list
2. Allow us to help test and validate your implementation
### 5. Local Testing Guide
To test your engine locally:
1. Create a directory structure following this hierarchy:
```bash
engines/
└── cortex.llamacpp/
└── mac-arm64/
└── v0.1.40/
├── libengine.dylib
└── version.txt
```
The `transformPayload` method is used to transform the data before sending it to the engine's API. This method takes the original payload and modifies it as needed.
1. Configure your engine:
**Example: Anthropic Engine**
- Edit the `~/.cortexrc` file to register your engine name
- Add your model with the appropriate engine field in `model.yaml`
In the Anthropic Engine, the `transformPayload` method extracts the system message and other messages, and includes additional parameters like `model`, `stream`, and `max_tokens`.
2. Testing:
- Start the engine
- Load your model
- Verify functionality
### `transformResponse`
## Future Development
The `transformResponse` method is used to transform the data received from the engine's API. This method processes the response and converts it into a format that the application can use.
We're currently working on expanding support for additional release sources to make distribution more flexible.
**Example: Anthropic Engine**
## Contributing
In the Anthropic Engine, the `transformResponse` method handles both stream and non-stream responses. It processes the response data and converts it into a standardized format.
-->
We welcome suggestions and contributions to improve this integration process. Please feel free to submit issues or pull requests through our repository.
22 changes: 6 additions & 16 deletions engine/cli/commands/server_start_cmd.cc
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
#include "server_start_cmd.h"
#include "commands/cortex_upd_cmd.h"
#include "services/engine_service.h"
#include "utils/cortex_utils.h"
#include "utils/engine_constants.h"
#include "utils/file_manager_utils.h"

#if defined(_WIN32) || defined(_WIN64)
#include "utils/widechar_conv.h"
#endif

namespace commands {

Expand Down Expand Up @@ -108,22 +111,9 @@ bool ServerStartCmd::Exec(const std::string& host, int port,
std::cerr << "Could not start server: " << std::endl;
return false;
} else if (pid == 0) {
// No need to configure LD_LIBRARY_PATH for macOS
#if !defined(__APPLE__) || !defined(__MACH__)
const char* name = "LD_LIBRARY_PATH";
auto data = getenv(name);
std::string v;
if (auto g = getenv(name); g) {
v += g;
}
CTL_INF("LD_LIBRARY_PATH: " << v);
auto llamacpp_path = file_manager_utils::GetCudaToolkitPath(kLlamaRepo);
auto trt_path = file_manager_utils::GetCudaToolkitPath(kTrtLlmRepo);
// Some engines requires to add lib search path before process being created
EngineService().RegisterEngineLibPath();

auto new_v = trt_path.string() + ":" + llamacpp_path.string() + ":" + v;
setenv(name, new_v.c_str(), true);
CTL_INF("LD_LIBRARY_PATH: " << getenv(name));
#endif
std::string p = cortex_utils::GetCurrentPath() + "/" + exe;
execl(p.c_str(), exe.c_str(), "--start-server", "--config_file_path",
get_config_file_path().c_str(), "--data_folder_path",
Expand Down
5 changes: 2 additions & 3 deletions engine/controllers/engines.cc
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,9 @@ std::string NormalizeEngine(const std::string& engine) {
void Engines::ListEngine(
const HttpRequestPtr& req,
std::function<void(const HttpResponsePtr&)>&& callback) const {
std::vector<std::string> supported_engines{kLlamaEngine, kOnnxEngine,
kTrtLlmEngine};
Json::Value ret;
for (const auto& engine : supported_engines) {
auto engine_names = engine_service_->GetSupportedEngineNames().value();
for (const auto& engine : engine_names) {
auto installed_engines =
engine_service_->GetInstalledEngineVariants(engine);
if (installed_engines.has_error()) {
Expand Down
30 changes: 30 additions & 0 deletions engine/cortex-common/EngineI.h
Original file line number Diff line number Diff line change
@@ -1,14 +1,44 @@
#pragma once

#include <filesystem>
#include <functional>
#include <memory>

#include "json/value.h"
#include "trantor/utils/Logger.h"
class EngineI {
public:
struct RegisterLibraryOption {
std::vector<std::filesystem::path> paths;
};

struct EngineLoadOption {
// engine
std::filesystem::path engine_path;
std::filesystem::path cuda_path;
bool custom_engine_path;

// logging
std::filesystem::path log_path;
int max_log_lines;
trantor::Logger::LogLevel log_level;
};

struct EngineUnloadOption {
bool unload_dll;
};

virtual ~EngineI() {}

/**
* Being called before starting process to register dependencies search paths.
*/
virtual void RegisterLibraryPath(RegisterLibraryOption opts) = 0;

virtual void Load(EngineLoadOption opts) = 0;

virtual void Unload(EngineUnloadOption opts) = 0;

// cortex.llamacpp interface
virtual void HandleChatCompletion(
std::shared_ptr<Json::Value> json_body,
Expand Down
Loading

0 comments on commit 43e740d

Please sign in to comment.