Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example Rocky Linux containers including the Nvidia driver #45

Merged
merged 1 commit into from
May 3, 2024

Conversation

anderbubble
Copy link
Collaborator

@anderbubble anderbubble commented Dec 30, 2023

Closes #31.

@anderbubble anderbubble added the enhancement New feature or request label Dec 30, 2023
@anderbubble anderbubble self-assigned this Dec 30, 2023
@anderbubble
Copy link
Collaborator Author

@edvinas31 can you review these example containers? I don't have a relevant GPU node spun up to test on right now; but these appear to install and build the nvidia driver appropriately.

Let me know if you need help building containers from the PR.

@anderbubble
Copy link
Collaborator Author

@brianphan I'd also be interested in your thoughts on these.

@edvinas31
Copy link

Hi, I already solved nvidia driver installation using rc.d. I did use separate overlay for gpu node and put nvidia runtime file installation script inside rc.d for that overlay. And it is working. I am not sure how should the steps you wrote here work, because in my case I have only one gpu compute node and my headnode does not have gpu

@anderbubble anderbubble force-pushed the nvidia branch 2 times, most recently from f9c074a to 3c694bb Compare May 3, 2024 20:45
@anderbubble
Copy link
Collaborator Author

@edvinas31 sorry for not responding earlier.

Using this method, your head node does not need to have a gpu. You can just build this image, which builds the GPU driver against the installed kernel within the image, and then serve it to GPU-equipped compute nodes. Even if you serve it to a non-GPU compute node it will still work--the GPU driver just doesn't get loaded or used.

I've simplified this PR to just provide an example for RL9, and also slightly simplified the Containerfile, too. Since it's just an example, and I've just demonstrated that it works in a test environment, I'm going to go ahead and merge it for others to see.

@anderbubble anderbubble marked this pull request as ready for review May 3, 2024 22:40
@anderbubble anderbubble merged commit 5d95048 into warewulf:main May 3, 2024
14 checks passed
@anderbubble anderbubble deleted the nvidia branch May 3, 2024 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide an example container image that shows how to install the Nvidia driver
2 participants