-
Notifications
You must be signed in to change notification settings - Fork 8
Home
earonesty edited this page Sep 18, 2023
·
9 revisions
The ai "worker" operates at a high level, responding to model inference, training fine tuning requests for specific model parameters.
Currently we only support llama2, via the llama-cpp-python library because it's fast, compatible with any platform and gpu, feature-complete and handles model-splitting. Stable diffusion will likely be the next target.
graph TD
subgraph "Many Workers"
W1[Connect to Spider URL via WebSockets]
W2[Send a Registration Message]
W3[Listen for Inferency & Training Websocket Requests]
W4[Reply with OpenAI Formatted Response]
W1 --> W2
W2 --> W3
W3 --> W4
end
subgraph "One Spider"
S1[Listen for Worker Connections]
S2[Accumulate List of Active Workers]
S3[Listen for Inference, Training REST Requests]
S4[Pick an Available Worker]
S5[Send Request Over a Websocket]
S6[Translate Reply Into REST Response]
S7[Complete Any Billing]
S1 --> S2
S2 --> S3
S3 --> S4
S4 --> S5
S5 --> S6
S6 --> S7
end
subgraph "Many Clients"
C1[Make REST Requests Against the Spider]
end
%%% Interactions %%%
W1 -.-> S1
S2 -.-> W2
S3 -.-> C1
C1 -.-> S3
W3 -.-> S5
S4 -.-> W4
W4 -.-> S6