Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(inference): introducing InferenceProviders #1161

Merged
merged 7 commits into from
Jun 7, 2024

Conversation

axel7083
Copy link
Contributor

@axel7083 axel7083 commented Jun 5, 2024

What does this PR do?

Introducing InferenceProvider interface, which is an abstraction class made to create InferenceServers. In todays implementation we do not make distinctions between backends (llamacpp, whispercpp etc.)

This is the first step in abstracting the inference providers, to ease the customization of inference server with new provider in the future (whispercpp, llamacpp-cuda, ollama etc.)

Documentation

image

What issues does this PR fix or reference?

Fixes #1112

How to test this PR?

  • unit tests has been provided

Manually (recommended)

  • Create an inference server
  • Start chatbot recipe

@axel7083 axel7083 requested review from benoitf and a team as code owners June 5, 2024 12:04
@axel7083 axel7083 requested review from lstocchi, jeffmaury and feloy June 5, 2024 12:04
@feloy
Copy link
Contributor

feloy commented Jun 5, 2024

In a near future, we will want to run inference servers in Kubernetes clusters too (OpenShift AI typically). I cannot see anything blocking doing this with this architecture, but just to be sure you have this scenario in mind

@axel7083
Copy link
Contributor Author

axel7083 commented Jun 5, 2024

In a near future, we will want to run inference servers in Kubernetes clusters too (OpenShift AI typically). I cannot see anything blocking doing this with this architecture, but just to be sure you have this scenario in mind

Thanks @feloy for this feedback. I was not having this in mind, but I think it would make it easier than our current architecture, as we could create a KubernetesInferenceProvider responsible of creating the pod in the Kubernetes cluster.

Copy link
Contributor

@lstocchi lstocchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codewise LGTM. Also tested and works fine. Nice job!!

Copy link
Contributor

@jeffmaury jeffmaury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependency on Podman Desktop 1.11 does not seems an absolute requirement and could be relaxed with few changes so I would delay merge as @slemeur approval should be required

@axel7083
Copy link
Contributor Author

axel7083 commented Jun 6, 2024

Dependency on Podman Desktop 1.11 does not seems an absolute requirement and could be relaxed with few changes so I would delay merge as @slemeur approval should be required

@jeffmaury I revert to @podman-desktop/api 1.10.3. As mention there is not absolute requirement.

@axel7083 axel7083 requested a review from jeffmaury June 6, 2024 10:06
@axel7083 axel7083 force-pushed the feature/inference-provider branch from 7086f1c to f9f3c98 Compare June 6, 2024 14:01
@axel7083 axel7083 merged commit d2ea36b into containers:main Jun 7, 2024
4 checks passed
@axel7083 axel7083 mentioned this pull request Jun 10, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

InferenceServer should support multiple backend
4 participants