Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set NVIDIA_DRIVER_CAPABILITIES to all when GPU is enabled #19345

Merged
merged 1 commit into from
Aug 20, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pkg/drivers/kic/oci/oci.go
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ func CreateContainerNode(p CreateParams) error { //nolint to suppress cyclomatic
runArgs = append(runArgs, "--ip", p.IP)
}
if p.GPUs != "" {
runArgs = append(runArgs, "--gpus", "all")
runArgs = append(runArgs, "--gpus", "all", "--env", "NVIDIA_DRIVER_CAPABILITIES=all")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for adding the example, and I found the documentation on this https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.10.0/user-guide.html

you are spot on ! it says "empty or unset | use default driver capability: utility, compute"I would love to see the example you provided be to be added as an integration test with the condition that it should skip the test if there is no GPU on the machine it avoid spamming failure on our CI machines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. I'll study how integration tests are implemented a bit and try to do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chubei-urus here is an example of integraiton test

https://github.com/medyagh/minikube/blob/abcff1741451c3867f80277115029457ad4fd23f/test/integration/start_stop_delete_test.go#L43

you can simply create a new file called
test/integration/gpu_ml_test.go

and create a new test there

and then you can have an if statment to skip the test if there gpu is not available on the test machine, for example
if hasGPU == false{
t.Skip("skipping test since the test machine does not have a GPU")
}

btw this would also be a good idea for a follow up PR, that if user machine does not have a GPU and they try to enable the gpu, we could warn them that you try to enable --gpus without one (follow up PR)

let me know if you have any questions

}

memcgSwap := hasMemorySwapCgroup()
Expand Down