feat: initial support for GPU on linux (TRACKING, replaced by separate PRs) #2180

mhdawson · 2024-11-29T00:35:11Z

What does this PR do?

Adds experimental support for GPU acceleration on Linux. It makes a key assumption which needs to be validated by people who know more about
the podman-desktop-extension-ai-lab than I do:

if the vmType is VMType.UNKNOWN, then the assuption is that we are running on linux and the vm type is unknown because there is no vm. This seemed reasonable to me as we should know the vmType on Windows or Mac.

There as also a few tweaks that were needed for my system:

a change to look for the first GPU that we know the type of. This was needed because my vm has 2 gpus, one of which is NVIDIA but it was not gpu[0]. This should have no effect in people only have 1 gpu.
a tweak to recognize and additional string as NVIDIA as that was what was being reported in my machine
one fix where vmType was being left as undefined instead of being set to VMType.UNKNOWN

It also requires some extensions in podman desktop which are in PR - podman-desktop/podman-desktop#10166

Screenshot / video of UI

N/A

What issues does this PR fix or reference?

#2162

How to test this PR?

For live test, I tested this PR along with this PR to podman-desktop podman-desktop/podman-desktop#10166, by creating a model service and validating that the service reported GPU acceleration, and used the GPU when requests were submitted to the service. I also tested with the chatbot Node.js recipe I was working on that it got GPU acceleration as well.

Signed-off-by: Michael Dawson <[email protected]>

axel7083 · 2024-12-02T08:43:29Z

packages/backend/src/workers/provider/LlamaCppPython.ts

+
+          supported = true;
+          devices.push({
+            PathOnHost: 'nvidia.com/gpu=all',


This need the NVIDIA Container Device Interface installed ?

This is a bit problematic, as today we cannot detect the device installed on the podman machine without some hacky stuff.

machine should expose the devices available podman#24042

feat(GPUManager): check nvidia container toolkit capabilities #1825 (tentative to detect the CDI)

@axel7083 do you mean detection on the local machine instead of podman machine? It looks like to me that on Linux there is no podman machine as podman runs natively on the local machine.

Yes but you still need to check for the CDI to be installed on the system ?

We might be able to do that by looking for the files that you need to generate for CDI. From

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html#procedure

on my Fedora system that is /etc/cdi/nvidia.yaml

Since it says it deps on the container engine you use, we could possibly just check in /etc/cdi/nvidia.yaml on the assumption that is what is for podman. We might later add additional places to check if needed later on.

If a check for the existance of that file would be enough I can look at adding it.

axel7083

Sorry for the late review, this is a long week!

Could you split the PR in two ? Currently there is two distinct feature:

Selecting the first NVIDIA GPU when multiple are available (Select first compatible GPU when multiple are detected #2214)
Adding the support for Linux (Detecting GPU(s) on Linux (Name, VRAM) #2162)

The first one is easier, so we could simplify this PR by merging the first one, and later one see what we can do for Linux, is that okey for you ?

mhdawson · 2024-12-05T19:16:04Z

Sure will split, the one for linux GPU acceleration needs a few more PRs to land in podman desktop first anyway.
Will likely be late Friday/Monday when I submit the split PRs

In terms of checking if CDI is configured do you think checking for /etc/cdi/nvidia.yaml make sense ?

axel7083 · 2024-12-09T09:33:42Z

In terms of checking if CDI is configured do you think checking for /etc/cdi/nvidia.yaml make sense ?

I think we could first only check for it yes, and if in the future we have more feedback, we can enhance the paths to checks

mhdawson · 2024-12-09T19:43:28Z

@axel7083 PR which includes just the support for finding the first supported GPU - #2238

mhdawson · 2024-12-09T23:06:58Z

@axel7083 and the second PR with just the changes to enable GPU on linux - #2240

feat: initial support for GPU on linux

7c85f4e

Signed-off-by: Michael Dawson <[email protected]>

mhdawson requested review from benoitf, jeffmaury and a team as code owners November 29, 2024 00:35

mhdawson requested review from axel7083 and gastoner November 29, 2024 00:35

This was referenced Nov 29, 2024

feat: improve libpod API support podman-desktop/podman-desktop#10166

Closed

Detecting GPU(s) on Linux (Name, VRAM) #2162

Open

axel7083 reviewed Dec 2, 2024

View reviewed changes

axel7083 reviewed Dec 5, 2024

View reviewed changes

mhdawson mentioned this pull request Dec 9, 2024

feat: improve GPU support when multiple GPUs #2238

Merged

mhdawson changed the title ~~feat: initial support for GPU on linux~~ feat: initial support for GPU on linux (TRACKING, replaced by separate PRs) Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: initial support for GPU on linux (TRACKING, replaced by separate PRs) #2180

feat: initial support for GPU on linux (TRACKING, replaced by separate PRs) #2180

mhdawson commented Nov 29, 2024 •

edited

Loading

axel7083 Dec 2, 2024 •

edited

Loading

mhdawson Dec 2, 2024

axel7083 Dec 2, 2024

mhdawson Dec 4, 2024 •

edited

Loading

axel7083 left a comment •

edited

Loading

mhdawson commented Dec 5, 2024

axel7083 commented Dec 9, 2024

mhdawson commented Dec 9, 2024

mhdawson commented Dec 9, 2024

feat: initial support for GPU on linux (TRACKING, replaced by separate PRs) #2180

Are you sure you want to change the base?

feat: initial support for GPU on linux (TRACKING, replaced by separate PRs) #2180

Conversation

mhdawson commented Nov 29, 2024 • edited Loading

What does this PR do?

Screenshot / video of UI

What issues does this PR fix or reference?

How to test this PR?

axel7083 Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

mhdawson Dec 2, 2024

Choose a reason for hiding this comment

axel7083 Dec 2, 2024

Choose a reason for hiding this comment

mhdawson Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

axel7083 left a comment • edited Loading

Choose a reason for hiding this comment

mhdawson commented Dec 5, 2024

axel7083 commented Dec 9, 2024

mhdawson commented Dec 9, 2024

mhdawson commented Dec 9, 2024

mhdawson commented Nov 29, 2024 •

edited

Loading

axel7083 Dec 2, 2024 •

edited

Loading

mhdawson Dec 4, 2024 •

edited

Loading

axel7083 left a comment •

edited

Loading