Skip to content

Commit

Permalink
add some REST docs
Browse files Browse the repository at this point in the history
  • Loading branch information
mmoskal committed Jan 25, 2024
1 parent 9b2ebf9 commit d144ee4
Show file tree
Hide file tree
Showing 4 changed files with 223 additions and 4 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Controllers are light-weight WebAssembly (Wasm) modules
which run on the same machine as the LLM inference engine, utilizing the CPU while the GPU is busy
with token generation.

AICI is meant to run both locally and in the cloud, including multi-tenant LLM deployments.
AICI is meant to run both locally and in the cloud, including (eventually) multi-tenant LLM deployments.
Guidance, LMQL, and other LLM control libraries are expected to run on top of AICI.

AICI is a prototype, designed and built at [Microsoft Research](https://www.microsoft.com/en-us/research/).
Expand All @@ -22,6 +22,7 @@ AICI is:
This repository contains:

- [definition](aici_abi/README.md#low-level-interface) of the AICI binary interface
- [REST API definition](REST.md) for AICI Server
- [aici_abi](aici_abi) - a Rust crate for easily implementing controllers (Wasm modules adhering to AICI)
- [aicirt](aicirt) - an implementation of a runtime for running controllers,
built on top Wasmtime;
Expand Down
202 changes: 202 additions & 0 deletions REST.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
# REST APIs for AICI

AICI server exposes REST APIs for uploading and tagging Controllers (.wasm files),
and extends the "completion" REST APIs to allow for running the controllers.

## Uploading a Controller

To upload a controller, POST it to `/v1/aici_modules`.
Note that the body is the raw binary `.wasm` file, not JSON-encoded.
The `module_id` is just the SHA256 hash of the `.wasm` file.

```json
// POST /v1/aici_modules
// ... binary of Wasm file ...
// 200 OK
{
"module_id": "44f595216d8410335a4beb1cc530321beabe050817b41bf24855c4072c2dde2d",
"wasm_size": 3324775,
"compiled_size": 11310512,
"time": 393
}
```

## Running a Controller

This API is similar to [OpenAI's Completion API](https://platform.openai.com/docs/api-reference/completions),
with added `aici_module` and `aici_arg` parameters.
The `aici_module` parameter specifies the module to run, either the HEX `module_id`
or a tag name (see below).
The `aici_arg` is the argument to pass to the module; it can be either JSON object (it will be encoded as a string)
or a JSON string (which will be passed as is).
The `jsctrl` expects an argument that is the string, which is the program to execute.
When using the AICI Controllers, the prompt is often empty.

```json
// POST /v1/completions
{
"model": "",
"prompt": "Ultimate answer is to the life, universe and everything is ",
"max_tokens": 2000,
"n": 1,
"temperature": 0.0,
"stream": true,
"aici_module": "jsctrl-latest",
"aici_arg": "async function main() { await gen({ regex: /\\d\\d/, storeVar: \"answer\" }) }\nstart(main)",
"ignore_eos": true
}
```

```
200 OK
data: {"object":"text_completion","id":"cmpl-a9997cb5-...
data: {"object":"text_completion","id":"cmpl-a9997cb5-...
...
data: [DONE]
```

Each `text_completion` object looks like below.
Fields that are new compared to OpenAI's API are:

- `logs` - console output of the controller
- `storage` - list of storage operations (that's one way of extracting the result of the controller);
the `value` in `WriteVar` is hex-encoded byte string
- `error` - set when there is an error
- `usage` - usage statistics

```json
{
"object": "text_completion",
"id": "cmpl-a9997cb5-01fd-4d0b-a194-ae945eaf7d57",
"model": "microsoft/Orca-2-13b",
"created": 1706141817,
"choices": [
{
"index": 0,
"finish_reason": null,
"text": "4",
"error": "",
"logs": "GEN {regex: /\\d\\d/, storeVar: answer}\nregex constraint: \"\\\\d\\\\d\"\ndfa: 160 bytes\n",
"storage": []
}
],
"usage": {
"completion_tokens": 1,
"prompt_tokens": 15,
"total_tokens": 16,
"fuel_tokens": 17
}
}
```

```json
{
"object": "text_completion",
"id": "cmpl-a9997cb5-01fd-4d0b-a194-ae945eaf7d57",
"model": "microsoft/Orca-2-13b",
"created": 1706141817,
"choices": [
{
"index": 0,
"finish_reason": null,
"text": "2",
"error": "",
"logs": "GEN [29946, 29906] b\"42\"\nJsCtrl: done\n",
"storage": [
{
"WriteVar": {
"name": "answer",
"value": "3432",
"op": "Set",
"when_version_is": null
}
}
]
}
],
"usage": {
"completion_tokens": 2,
"prompt_tokens": 16,
"total_tokens": 18,
"fuel_tokens": 20
}
}
```

```json
{
"object": "text_completion",
"id": "cmpl-a9997cb5-01fd-4d0b-a194-ae945eaf7d57",
"model": "microsoft/Orca-2-13b",
"created": 1706141817,
"choices": [
{
"index": 0,
"finish_reason": "aici-stop",
"text": " ",
"error": "",
"logs": "",
"storage": []
}
],
"usage": {
"completion_tokens": 3,
"prompt_tokens": 17,
"total_tokens": 20,
"fuel_tokens": 23
}
}
```

## Tags

You can tag a `module_id` with one or more tags:

```json
// POST /v1/aici_modules/tags
{
"module_id": "44f595216d8410335a4beb1cc530321beabe050817b41bf24855c4072c2dde2d",
"tags": ["jsctrl-test"]
}
// 200 OK
{
"tags": [
{
"tag": "jsctrl-test",
"module_id": "44f595216d8410335a4beb1cc530321beabe050817b41bf24855c4072c2dde2d",
"updated_at": 1706140462,
"updated_by": "mimoskal",
"wasm_size": 3324775,
"compiled_size": 11310512
}
]
}
```

You can also list all existing tags:

```json
// GET /v1/aici_modules/tags
// 200 OK
{
"tags": [
{
"tag": "pyctrl-v0.0.3",
"module_id": "41bc81f0ce56f2add9c18e914e30919e6b608c1eaec593585bcebd61cc1ba744",
"updated_at": 1705629923,
"updated_by": "mimoskal",
"wasm_size": 13981950,
"compiled_size": 42199432
},
{
"tag": "pyctrl-latest",
"module_id": "41bc81f0ce56f2add9c18e914e30919e6b608c1eaec593585bcebd61cc1ba744",
"updated_at": 1705629923,
"updated_by": "mimoskal",
"wasm_size": 13981950,
"compiled_size": 42199432
},
...
]
}
```
6 changes: 5 additions & 1 deletion pyaici/rest.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,10 @@ def req(tp: str, url: str, **kwargs):
print(f"{tp.upper()} {url} headers={headers}")
if "json" in kwargs:
print(json.dumps(kwargs["json"]))
return requests.request(tp, url, headers=headers, **kwargs)
resp = requests.request(tp, url, headers=headers, **kwargs)
if log_level >= 4 and "stream" not in kwargs:
print(f"{resp.status_code} {resp.reason}: {resp.text}")
return resp


def upload_module(file_path: str) -> str:
Expand Down Expand Up @@ -160,6 +163,7 @@ def completion(
if not line:
continue
decoded_line: str = line.decode("utf-8")
# print(decoded_line)
if decoded_line.startswith("data: {"):
d = json.loads(decoded_line[6:])
full_resp.append(d)
Expand Down
16 changes: 14 additions & 2 deletions pyctrl/samples/test.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,5 +163,17 @@ async def test_eos():
await aici.gen_tokens(regex=r' "[^"]+"', max_tokens=6, store_var="french")
aici.check_vars({"french": ' "bonjour"'})


aici.test(test_fork())
async def test_joke():
await aici.FixedTokens("Do you want a joke or a poem? A")
answer = await aici.gen_text(options=[" joke", " poem"])
if answer == " joke":
await aici.FixedTokens("\nHere is a one-line joke about cats: ")
else:
await aici.FixedTokens("\nHere is a one-line poem about dogs: ")
await aici.gen_text(regex="[A-Z].*", stop_at="\n", store_var="result")
print("explaining...")
await aici.FixedTokens("\nLet me explain it: ")
await aici.gen_text(max_tokens=15)


aici.test(test_joke())

0 comments on commit d144ee4

Please sign in to comment.