Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: initial support for GPU on linux (TRACKING, replaced by separate PRs) #2180

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mhdawson
Copy link
Contributor

@mhdawson mhdawson commented Nov 29, 2024

What does this PR do?

Adds experimental support for GPU acceleration on Linux. It makes a key assumption which needs to be validated by people who know more about
the podman-desktop-extension-ai-lab than I do:

  • if the vmType is VMType.UNKNOWN, then the assuption is that we are running on linux and the vm type is unknown because there is no vm. This seemed reasonable to me as we should know the vmType on Windows or Mac.

There as also a few tweaks that were needed for my system:

  1. a change to look for the first GPU that we know the type of. This was needed because my vm has 2 gpus, one of which is NVIDIA but it was not gpu[0]. This should have no effect in people only have 1 gpu.
  2. a tweak to recognize and additional string as NVIDIA as that was what was being reported in my machine
  3. one fix where vmType was being left as undefined instead of being set to VMType.UNKNOWN

It also requires some extensions in podman desktop which are in PR - podman-desktop/podman-desktop#10166

Screenshot / video of UI

N/A

What issues does this PR fix or reference?

#2162

How to test this PR?

For live test, I tested this PR along with this PR to podman-desktop podman-desktop/podman-desktop#10166, by creating a model service and validating that the service reported GPU acceleration, and used the GPU when requests were submitted to the service. I also tested with the chatbot Node.js recipe I was working on that it got GPU acceleration as well.


supported = true;
devices.push({
PathOnHost: 'nvidia.com/gpu=all',
Copy link
Contributor

@axel7083 axel7083 Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This need the NVIDIA Container Device Interface installed ?

This is a bit problematic, as today we cannot detect the device installed on the podman machine without some hacky stuff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@axel7083 do you mean detection on the local machine instead of podman machine? It looks like to me that on Linux there is no podman machine as podman runs natively on the local machine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but you still need to check for the CDI to be installed on the system ?

Copy link
Contributor Author

@mhdawson mhdawson Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to do that by looking for the files that you need to generate for CDI. From

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html#procedure

on my Fedora system that is /etc/cdi/nvidia.yaml

Since it says it deps on the container engine you use, we could possibly just check in /etc/cdi/nvidia.yaml on the assumption that is what is for podman. We might later add additional places to check if needed later on.

If a check for the existance of that file would be enough I can look at adding it.

Copy link
Contributor

@axel7083 axel7083 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review, this is a long week!

Could you split the PR in two ? Currently there is two distinct feature:

The first one is easier, so we could simplify this PR by merging the first one, and later one see what we can do for Linux, is that okey for you ?

@mhdawson
Copy link
Contributor Author

mhdawson commented Dec 5, 2024

Sure will split, the one for linux GPU acceleration needs a few more PRs to land in podman desktop first anyway.
Will likely be late Friday/Monday when I submit the split PRs

In terms of checking if CDI is configured do you think checking for /etc/cdi/nvidia.yaml make sense ?

@axel7083
Copy link
Contributor

axel7083 commented Dec 9, 2024

In terms of checking if CDI is configured do you think checking for /etc/cdi/nvidia.yaml make sense ?

I think we could first only check for it yes, and if in the future we have more feedback, we can enhance the paths to checks

@mhdawson
Copy link
Contributor Author

mhdawson commented Dec 9, 2024

@axel7083 PR which includes just the support for finding the first supported GPU - #2238

@mhdawson
Copy link
Contributor Author

mhdawson commented Dec 9, 2024

@axel7083 and the second PR with just the changes to enable GPU on linux - #2240

@mhdawson mhdawson changed the title feat: initial support for GPU on linux feat: initial support for GPU on linux (TRACKING, replaced by separate PRs) Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants