install gpu-operator fail #11172

absolutelyZero · 2024-10-25T09:37:54Z

Environmental Info:
K3s Version:

(base) [root@localhost docker-package]# k3s -v
k3s version v1.27.9+k3s1 (2c249a39)
go version go1.20.12

root@corp:~/gpu-operator/gpu-operator-24.6.2/deployments# helm version
version.BuildInfo{Version:"v3.16.2", GitCommit:"13654a52f7c70a143b1dd51416d633e1071faffb", GitTreeState:"clean", GoVersion:"go1.22.7"}

Node(s) CPU architecture, OS, and Version:

(base) [root@localhost docker-package]# cat /etc/redhat-release 
CentOS Linux release 8.5.2111

Cluster Configuration:

root@corp:~/gpu-operator/gpu-operator-24.6.2/deployments# kubectl get node -o wide
NAME                    STATUS   ROLES                  AGE    VERSION        INTERNAL-IP      EXTERNAL-IP      OS-IMAGE             KERNEL-VERSION                CONTAINER-RUNTIME
corp                    Ready    control-plane,master   235d   v1.27.9+k3s1   172.16.104.232   172.16.104.232   Ubuntu 20.04.6 LTS   5.4.0-177-generic             containerd://1.7.11-k3s2.27
localhost.localdomain   Ready    <none>                 23h    v1.27.9+k3s1   172.16.103.188   172.16.103.188   CentOS Linux 8       4.18.0-348.7.1.el8_5.x86_64   containerd://1.7.11-k3s2.27

Describe the bug:

The cluster is provisioned with Helm. There are a Nvidia 4090 GPU card in node localhost.localdomain.

When I try to install gpu-operator 24.6.2 with helm-chart, i get following error

root@corp:~/gpu-operator/gpu-operator-24.6.2# helm install nvidiagpu -n gpu-operator --create-namespace --set toolkit.env[0].name=CONTAINERD_CONFIG --set toolkit.env[0].value=/var/lib/rancher/k3s/agent/etc/containerd/config.toml --set toolkit.env[1].name=CONTAINERD_SOCKET --set toolkit.env[1].value=/run/k3s/containerd/containerd.sock --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS --set toolkit.env[2].value=nvidia --set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT --set-string toolkit.env[3].value=true nvidia/gpu-operator
Error: INSTALLATION FAILED: failed to install CRD crds/nvidia.com_clusterpolicies_crd.yaml: resource mapping not found for name: "clusterpolicies.nvidia.com" namespace: "" from "": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"
ensure CRDs are installed first

Expected behavior:

helm install successfully.

Actual behavior:

helm install fail

The text was updated successfully, but these errors were encountered:

brandond · 2024-10-25T15:16:24Z

I don't believe this is an issue with k3s. This looks like a problem with the chart or the values you're passing when installing the chart. Did you want to enable creating the CRDs?

github-project-automation bot added this to K3s Development Oct 25, 2024

github-project-automation bot moved this to New in K3s Development Oct 25, 2024

absolutelyZero changed the title ~~instaill gpu-operator fail~~ install gpu-operator fail Oct 25, 2024

k3s-io locked and limited conversation to collaborators Oct 25, 2024

brandond converted this issue into discussion #11174 Oct 25, 2024

github-project-automation bot moved this from New to Done Issue in K3s Development Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

install gpu-operator fail #11172

install gpu-operator fail #11172

absolutelyZero commented Oct 25, 2024

brandond commented Oct 25, 2024 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

install gpu-operator fail #11172

install gpu-operator fail #11172

Comments

absolutelyZero commented Oct 25, 2024

brandond commented Oct 25, 2024 • edited Loading

This issue was moved to a discussion.

brandond commented Oct 25, 2024 •

edited

Loading