-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: initial support for GPU on linux (TRACKING, replaced by separate PRs) #2180
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Michael Dawson <[email protected]>
|
||
supported = true; | ||
devices.push({ | ||
PathOnHost: 'nvidia.com/gpu=all', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This need the NVIDIA Container Device Interface
installed ?
This is a bit problematic, as today we cannot detect the device installed on the podman machine without some hacky stuff.
- machine should expose the devices available podman#24042
- feat(GPUManager): check nvidia container toolkit capabilities #1825 (tentative to detect the CDI)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@axel7083 do you mean detection on the local machine instead of podman machine? It looks like to me that on Linux there is no podman machine as podman runs natively on the local machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but you still need to check for the CDI to be installed on the system ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to do that by looking for the files that you need to generate for CDI. From
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html#procedure
on my Fedora system that is /etc/cdi/nvidia.yaml
Since it says it deps on the container engine you use, we could possibly just check in /etc/cdi/nvidia.yaml
on the assumption that is what is for podman. We might later add additional places to check if needed later on.
If a check for the existance of that file would be enough I can look at adding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review, this is a long week!
Could you split the PR in two ? Currently there is two distinct feature:
- Selecting the first NVIDIA GPU when multiple are available (Select first compatible GPU when multiple are detected #2214)
- Adding the support for Linux (Detecting GPU(s) on Linux (Name, VRAM) #2162)
The first one is easier, so we could simplify this PR by merging the first one, and later one see what we can do for Linux, is that okey for you ?
Sure will split, the one for linux GPU acceleration needs a few more PRs to land in podman desktop first anyway. In terms of checking if CDI is configured do you think checking for |
I think we could first only check for it yes, and if in the future we have more feedback, we can enhance the paths to checks |
What does this PR do?
Adds experimental support for GPU acceleration on Linux. It makes a key assumption which needs to be validated by people who know more about
the podman-desktop-extension-ai-lab than I do:
There as also a few tweaks that were needed for my system:
It also requires some extensions in podman desktop which are in PR - podman-desktop/podman-desktop#10166
Screenshot / video of UI
N/A
What issues does this PR fix or reference?
#2162
How to test this PR?
For live test, I tested this PR along with this PR to podman-desktop podman-desktop/podman-desktop#10166, by creating a model service and validating that the service reported GPU acceleration, and used the GPU when requests were submitted to the service. I also tested with the chatbot Node.js recipe I was working on that it got GPU acceleration as well.