new flag -"-gpu" to enable Nvidia container runtime #17314

spowelljr · 2023-09-27T20:39:55Z

Rework of #17287 that removes the nvidia-docker container-runtime and uses the --gpus flag instead.

$ minikube start --gpus all
😄  minikube v1.31.2 on Debian rodete
✨  Using the docker driver based on user configuration
📌  Using Docker driver with root privileges
👍  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
🔥  Creating docker container (CPUs=2, Memory=32100MB) ...
❗  Using GPUs with the Docker driver is experimental, if you experience any issues please report them at: https://github.com/kubernetes/minikube/issues/new/choose
🛠️   Installing the NVIDIA Container Toolkit...
🐳  Preparing Kubernetes v1.28.2 on Docker 24.0.6 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring bridge CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image nvcr.io/nvidia/k8s-device-plugin:v0.14.1
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, nvidia-device-plugin, default-storageclass
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

$ cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: nvidia-version-check
spec:
  restartPolicy: OnFailure
  containers:
  - name: nvidia-version-check
    image: "nvidia/cuda:11.0.3-base-ubuntu20.04"
    command: ["nvidia-smi"]
    resources:
      limits:
         nvidia.com/gpu: "1"
EOF
pod/nvidia-version-check created

$ kubectl logs nvidia-version-check
Fri Sep 22 18:45:31 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P1000        On   | 00000000:65:00.0 Off |                  N/A |
| 34%   26C    P8    N/A /  47W |     15MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

$ minikube start --driver kvm --gpus all
😄  minikube v1.31.2 on Debian rodete (kvm/amd64)
✨  Using the kvm2 driver based on user configuration

❌  Exiting due to MK_USAGE: The gpus flag can only be used with the docker driver and docker container-runtime

$ minikube start --gpus cat
😄  minikube v1.31.2 on Debian rodete
✨  Automatically selected the docker driver

❌  Exiting due to MK_USAGE: The gpus flag must be passed a value of "nvidia" or "all"

cmd/minikube/cmd/start_flags.go

medyagh · 2023-10-03T19:38:03Z

/ok-to-test

cmd/minikube/cmd/start_flags.go

medyagh

let also add a warning for the users that this feature is Beta and we like to get their feedback

site/content/en/docs/tutorials/nvidia.md

medyagh

Thank you !

k8s-ci-robot · 2023-10-06T01:14:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: medyagh, spowelljr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [medyagh,spowelljr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

minikube-pr-bot · 2023-10-06T01:21:42Z

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17314) |
+----------------+----------+---------------------+
| minikube start | 51.1s    | 51.4s               |
| enable ingress | 27.8s    | 28.4s               |
+----------------+----------+---------------------+

Times for minikube start: 51.1s 50.8s 52.5s 49.7s 51.5s
Times for minikube (PR 17314) start: 51.5s 51.7s 52.3s 51.3s 50.4s

Times for minikube ingress: 28.2s 28.1s 26.3s 28.5s 27.7s
Times for minikube (PR 17314) ingress: 28.1s 28.2s 28.5s 28.6s 28.7s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17314) |
+----------------+----------+---------------------+
| minikube start | 24.0s    | 23.9s               |
| enable ingress | 20.7s    | 21.2s               |
+----------------+----------+---------------------+

Times for minikube ingress: 20.8s 20.3s 20.8s 20.8s 20.8s
Times for minikube (PR 17314) ingress: 20.8s 22.8s 20.9s 20.8s 20.4s

Times for minikube start: 24.6s 24.2s 24.8s 21.7s 24.6s
Times for minikube (PR 17314) start: 21.7s 25.1s 25.7s 22.0s 24.9s

docker driver with containerd runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17314) |
+----------------+----------+---------------------+
| minikube start | 22.6s    | 21.4s               |
| enable ingress | 32.1s    | 31.9s               |
+----------------+----------+---------------------+

Times for minikube start: 23.2s 19.9s 23.4s 22.9s 23.4s
Times for minikube (PR 17314) start: 20.8s 22.7s 23.3s 19.9s 20.3s

Times for minikube ingress: 31.3s 31.3s 47.3s 31.3s 19.3s
Times for minikube (PR 17314) ingress: 31.3s 31.3s 18.4s 47.3s 31.4s

minikube-pr-bot · 2023-10-06T03:24:29Z

These are the flake rates of all failed tests.

Environment	Failed Tests	Flake Rate (%)
KVM_Linux_containerd	TestAddons/parallel/Ingress (gopogh)	0.00 (chart)
KVM_Linux	TestNoKubernetes/serial/StartNoArgs (gopogh)	4.84 (chart)
Docker_Linux_crio_arm64	TestPause/serial/SecondStartNoReconfiguration (gopogh)	10.40 (chart)
Hyper-V_Windows	TestRunningBinaryUpgrade (gopogh)	32.00 (chart)

To see the flake rates of all tests by environment, click here.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 27, 2023

k8s-ci-robot requested review from medyagh and prezha September 27, 2023 20:40

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 27, 2023

medyagh requested changes Sep 27, 2023

View reviewed changes

cmd/minikube/cmd/start_flags.go Outdated Show resolved Hide resolved

medyagh requested changes Oct 2, 2023

View reviewed changes

cmd/minikube/cmd/start_flags.go Outdated Show resolved Hide resolved

medyagh changed the title ~~Automate installing NVIDIA Container Toolkit w/ flag~~ new flag -"-gpu" to enable Nvidia container runtime Oct 3, 2023

k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Oct 3, 2023

This comment has been minimized.

Sign in to view

medyagh requested changes Oct 4, 2023

View reviewed changes

cmd/minikube/cmd/start_flags.go Show resolved Hide resolved

spowelljr force-pushed the gpusFlag branch 3 times, most recently from 8569da1 to 81acfe3 Compare October 4, 2023 22:10

This comment has been minimized.

Sign in to view

medyagh requested changes Oct 5, 2023

View reviewed changes

This comment has been minimized.

Sign in to view

medyagh requested changes Oct 5, 2023

View reviewed changes

site/content/en/docs/tutorials/nvidia.md Show resolved Hide resolved

This comment has been minimized.

Sign in to view

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 6, 2023

spowelljr added 6 commits October 5, 2023 17:18

Automate installing NVIDIA Container Toolkit

53b0908

add nvidia-device-plugin test

d07ff48

increase test timeout

1af615d

add missing label selector

22046cd

Automate installing NVIDIA Container Toolkit w/ flag

7f5fbf9

fix gpus flag not getting passed to docker

311630c

spowelljr added 7 commits October 5, 2023 17:18

added unit test for validate func

ab6a453

fix possible nil reference in test

76c1fd4

change gpus flag from bool to string

55b78ed

update TestValidateGPUs

8266558

add g shorthand for gpus

f1e05f1

add experimental warning to output

c71d9ee

add minikube version to doc

720b042

spowelljr force-pushed the gpusFlag branch from 07b1fcc to 720b042 Compare October 6, 2023 00:18

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 6, 2023

medyagh approved these changes Oct 6, 2023

View reviewed changes

This comment has been minimized.

Sign in to view

medyagh merged commit 3fabfbe into kubernetes:master Oct 6, 2023
16 checks passed

spowelljr deleted the gpusFlag branch October 9, 2023 18:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new flag -"-gpu" to enable Nvidia container runtime #17314

new flag -"-gpu" to enable Nvidia container runtime #17314

spowelljr commented Sep 27, 2023 •

edited

Loading

medyagh commented Oct 3, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

medyagh left a comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

medyagh left a comment

k8s-ci-robot commented Oct 6, 2023

minikube-pr-bot commented Oct 6, 2023

This comment has been minimized.

minikube-pr-bot commented Oct 6, 2023

new flag -"-gpu" to enable Nvidia container runtime #17314

new flag -"-gpu" to enable Nvidia container runtime #17314

Conversation

spowelljr commented Sep 27, 2023 • edited Loading

medyagh commented Oct 3, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

medyagh left a comment

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

medyagh left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Oct 6, 2023

minikube-pr-bot commented Oct 6, 2023

This comment has been minimized.

minikube-pr-bot commented Oct 6, 2023

spowelljr commented Sep 27, 2023 •

edited

Loading