Automate installing NVIDIA Container Toolkit --container-runtime #17287

spowelljr · 2023-09-20T23:27:47Z

$ minikube start --container-runtime nvidia-docker
😄  minikube v1.31.2 on Debian rodete
✨  Automatically selected the docker driver
📌  Using Docker driver with root privileges
👍  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
🔥  Creating docker container (CPUs=2, Memory=32100MB) ...
🛠️   Installing the NVIDIA Container Toolkit...
🐳  Preparing Kubernetes v1.28.2 on Docker 24.0.6 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image nvcr.io/nvidia/k8s-device-plugin:v0.14.1
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: nvidia-device-plugin, storage-provisioner, default-storageclass
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

$ cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: nvidia-version-check
spec:
  restartPolicy: OnFailure
  containers:
  - name: nvidia-version-check
    image: "nvidia/cuda:11.0.3-base-ubuntu20.04"
    command: ["nvidia-smi"]
    resources:
      limits:
         nvidia.com/gpu: "1"
EOF
pod/nvidia-version-check created

$ kubectl logs nvidia-version-check
Fri Sep 22 18:45:31 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P1000        On   | 00000000:65:00.0 Off |                  N/A |
| 34%   26C    P8    N/A /  47W |     15MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

$ minikube start --driver kvm --container-runtime nvidia-docker
😄  minikube v1.31.2 on Debian rodete (kvm/amd64)
✨  Using the kvm2 driver based on user configuration

❌  Exiting due to MK_USAGE: The nvidia-docker container-runtime can only be run with the docker driver

k8s-ci-robot · 2023-09-20T23:27:57Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: spowelljr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [spowelljr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

spowelljr · 2023-09-25T22:16:41Z

/ok-to-test

minikube-pr-bot · 2023-09-26T22:59:04Z

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17287) |
+----------------+----------+---------------------+
| minikube start | 50.2s    | 50.3s               |
| enable ingress | 27.0s    | 28.0s               |
+----------------+----------+---------------------+

Times for minikube start: 48.2s 51.3s 50.3s 50.4s 51.0s
Times for minikube (PR 17287) start: 50.1s 51.7s 50.8s 51.8s 47.0s

Times for minikube ingress: 27.1s 25.2s 27.7s 26.7s 28.1s
Times for minikube (PR 17287) ingress: 27.6s 27.7s 27.2s 29.2s 28.6s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17287) |
+----------------+----------+---------------------+
| minikube start | 24.4s    | 23.9s               |
| enable ingress | 21.0s    | 21.0s               |
+----------------+----------+---------------------+

Times for minikube start: 24.7s 25.1s 22.1s 24.1s 25.7s
Times for minikube (PR 17287) start: 25.1s 22.3s 25.1s 21.7s 25.1s

Times for minikube ingress: 20.8s 21.3s 21.3s 20.8s 20.8s
Times for minikube (PR 17287) ingress: 20.8s 20.8s 20.9s 22.8s 19.8s

docker driver with containerd runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17287) |
+----------------+----------+---------------------+
| minikube start | 23.3s    | 22.8s               |
| enable ingress | 32.0s    | 29.1s               |
+----------------+----------+---------------------+

Times for minikube start: 23.4s 20.8s 23.8s 24.7s 23.7s
Times for minikube (PR 17287) start: 24.3s 21.4s 24.2s 23.3s 21.0s

Times for minikube ingress: 31.3s 31.3s 47.3s 18.4s 31.4s
Times for minikube (PR 17287) ingress: 31.4s 20.3s 31.3s 31.3s 31.3s

minikube-pr-bot · 2023-09-27T01:06:42Z

These are the flake rates of all failed tests.

Environment	Failed Tests	Flake Rate (%)
KVM_Linux_crio	TestStartStop/group/no-preload/serial/Pause (gopogh)	n/a
KVM_Linux_crio	TestStartStop/group/no-preload/serial/VerifyKubernetesImages (gopogh)	n/a
Docker_Linux_containerd	TestKVMDriverInstallOrUpdate (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/CertSync (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/DockerEnv/powershell (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/ImageCommands/ImageBuild (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/ImageCommands/ImageListJson (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/ImageCommands/ImageListShort (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/ImageCommands/ImageListTable (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/ImageCommands/ImageListYaml (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/ImageCommands/ImageLoadDaemon (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/ImageCommands/ImageLoadFromFile (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/ImageCommands/ImageReloadDaemon (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/ImageCommands/ImageSaveToFile (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/ImageCommands/ImageTagAndLoadDaemon (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/MySQL (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/NodeLabels (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/PersistentVolumeClaim (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/ServiceCmdConnect (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/ServiceCmd/DeployApp (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/ServiceCmd/Format (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/ServiceCmd/HTTPS (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/ServiceCmd/JSONOutput (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/ServiceCmd/List (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/ServiceCmd/URL (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/StatusCmd (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/TunnelCmd/serial/RunSecondTunnel (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/TunnelCmd/serial/WaitService/Setup (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/parallel/UpdateContextCmd/no_changes (gopogh)	0.00 (chart)
Hyper-V_Windows	TestFunctional/serial/CacheCmd/cache/cache_reload (gopogh)	0.00 (chart)
More tests...	Continued...

Too many tests failed - See test logs for more details.

To see the flake rates of all tests by environment, click here.

medyagh

lets try to make another PR that will keep container runtimes 3 and have a flag called --enable-nvidia that way if we later have amd-gpu and also containerd and crio we could enable it fo rthem

k8s-ci-robot · 2023-10-04T23:37:04Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 20, 2023

k8s-ci-robot requested review from afbjorklund and prezha September 20, 2023 23:27

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 20, 2023

spowelljr force-pushed the gpus branch from 32ee304 to 7574929 Compare September 21, 2023 17:26

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 21, 2023

spowelljr force-pushed the gpus branch 3 times, most recently from a569fbb to 091ff2d Compare September 25, 2023 17:54

spowelljr changed the title ~~WIP: Automate installing NVIDIA Container Toolkit~~ Automate installing NVIDIA Container Toolkit Sep 25, 2023

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 25, 2023

k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Sep 25, 2023

This comment has been minimized.

Sign in to view

spowelljr added 4 commits September 26, 2023 14:58

Automate installing NVIDIA Container Toolkit

fdf4bfd

add nvidia-device-plugin test

28695db

increase test timeout

5852be5

add missing label selector

b64950e

spowelljr force-pushed the gpus branch from a2c19af to b64950e Compare September 26, 2023 22:00

medyagh requested changes Sep 27, 2023

View reviewed changes

spowelljr mentioned this pull request Sep 27, 2023

new flag -"-gpu" to enable Nvidia container runtime #17314

Merged

medyagh changed the title ~~Automate installing NVIDIA Container Toolkit~~ Automate installing NVIDIA Container Toolkit --container-runtime Oct 2, 2023

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 4, 2023

spowelljr closed this Oct 5, 2023

spowelljr deleted the gpus branch October 24, 2023 18:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automate installing NVIDIA Container Toolkit --container-runtime #17287

Automate installing NVIDIA Container Toolkit --container-runtime #17287

spowelljr commented Sep 20, 2023 •

edited

Loading

k8s-ci-robot commented Sep 20, 2023

spowelljr commented Sep 25, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

minikube-pr-bot commented Sep 26, 2023

minikube-pr-bot commented Sep 27, 2023

medyagh left a comment

k8s-ci-robot commented Oct 4, 2023

Automate installing NVIDIA Container Toolkit --container-runtime #17287

Automate installing NVIDIA Container Toolkit --container-runtime #17287

Conversation

spowelljr commented Sep 20, 2023 • edited Loading

k8s-ci-robot commented Sep 20, 2023

spowelljr commented Sep 25, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

minikube-pr-bot commented Sep 26, 2023

minikube-pr-bot commented Sep 27, 2023

medyagh left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Oct 4, 2023

spowelljr commented Sep 20, 2023 •

edited

Loading