Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: update image to support usage info #81

Merged
merged 5 commits into from
Dec 6, 2024

Conversation

jeffmaury
Copy link
Contributor

@jeffmaury jeffmaury commented Nov 27, 2024

What does this PR do?

Update the inference server to return usage statistics in streaming mode

Screenshot / video of UI

N/A

What issues does this PR fix or reference?

Fixes containers/podman-desktop-extension-ai-lab#1730

How to test this PR?

  1. Build the image
  2. Update the inference server.json file
  3. Start the inference server for a model
  4. Execute a curl query with streaming mode activated and usage info requested

@jeffmaury jeffmaury requested a review from a team as a code owner November 27, 2024 09:48
@axel7083
Copy link
Contributor

Referencing abetlen/llama-cpp-python#1552 since we are using it

Copy link
Contributor

@axel7083 axel7083 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly emit serious doubt to the longevity of such images. I don't think we should in the future build too much on the llama-cpp-python project.

chat/setup.sh Outdated Show resolved Hide resolved
Signed-off-by: Jeff MAURY <[email protected]>
Signed-off-by: Jeff MAURY <[email protected]>
Signed-off-by: Jeff MAURY <[email protected]>
@jeffmaury jeffmaury merged commit ad0b3ec into containers:main Dec 6, 2024
6 checks passed
@jeffmaury jeffmaury deleted the GH-1730 branch December 6, 2024 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose metrics in the inference server API
3 participants