-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(inference): introducing InferenceProviders #1161
Conversation
In a near future, we will want to run inference servers in Kubernetes clusters too (OpenShift AI typically). I cannot see anything blocking doing this with this architecture, but just to be sure you have this scenario in mind |
Thanks @feloy for this feedback. I was not having this in mind, but I think it would make it easier than our current architecture, as we could create a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codewise LGTM. Also tested and works fine. Nice job!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dependency on Podman Desktop 1.11 does not seems an absolute requirement and could be relaxed with few changes so I would delay merge as @slemeur approval should be required
packages/backend/src/managers/inference/inferenceManager.spec.ts
Outdated
Show resolved
Hide resolved
@jeffmaury I revert to |
Signed-off-by: axel7083 <[email protected]>
Signed-off-by: axel7083 <[email protected]>
Signed-off-by: axel7083 <[email protected]>
Signed-off-by: axel7083 <[email protected]>
Signed-off-by: axel7083 <[email protected]>
Signed-off-by: axel7083 <[email protected]>
Signed-off-by: axel7083 <[email protected]>
7086f1c
to
f9f3c98
Compare
What does this PR do?
Introducing
InferenceProvider
interface, which is an abstraction class made to create InferenceServers. In todays implementation we do not make distinctions betweenbackends
(llamacpp, whispercpp etc.)This is the first step in abstracting the inference providers, to ease the customization of inference server with new provider in the future (whispercpp, llamacpp-cuda, ollama etc.)
Documentation
What issues does this PR fix or reference?
Fixes #1112
How to test this PR?
Manually (recommended)