Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install gpu-operator fail #11172

Closed
absolutelyZero opened this issue Oct 25, 2024 · 1 comment
Closed

install gpu-operator fail #11172

absolutelyZero opened this issue Oct 25, 2024 · 1 comment

Comments

@absolutelyZero
Copy link

Environmental Info:
K3s Version:

(base) [root@localhost docker-package]# k3s -v
k3s version v1.27.9+k3s1 (2c249a39)
go version go1.20.12

root@corp:~/gpu-operator/gpu-operator-24.6.2/deployments# helm version
version.BuildInfo{Version:"v3.16.2", GitCommit:"13654a52f7c70a143b1dd51416d633e1071faffb", GitTreeState:"clean", GoVersion:"go1.22.7"}

Node(s) CPU architecture, OS, and Version:

(base) [root@localhost docker-package]# cat /etc/redhat-release 
CentOS Linux release 8.5.2111

Cluster Configuration:

root@corp:~/gpu-operator/gpu-operator-24.6.2/deployments# kubectl get node -o wide
NAME                    STATUS   ROLES                  AGE    VERSION        INTERNAL-IP      EXTERNAL-IP      OS-IMAGE             KERNEL-VERSION                CONTAINER-RUNTIME
corp                    Ready    control-plane,master   235d   v1.27.9+k3s1   172.16.104.232   172.16.104.232   Ubuntu 20.04.6 LTS   5.4.0-177-generic             containerd://1.7.11-k3s2.27
localhost.localdomain   Ready    <none>                 23h    v1.27.9+k3s1   172.16.103.188   172.16.103.188   CentOS Linux 8       4.18.0-348.7.1.el8_5.x86_64   containerd://1.7.11-k3s2.27

Describe the bug:

The cluster is provisioned with Helm. There are a Nvidia 4090 GPU card in node localhost.localdomain.

When I try to install gpu-operator 24.6.2 with helm-chart, i get following error

root@corp:~/gpu-operator/gpu-operator-24.6.2# helm install nvidiagpu -n gpu-operator --create-namespace --set toolkit.env[0].name=CONTAINERD_CONFIG --set toolkit.env[0].value=/var/lib/rancher/k3s/agent/etc/containerd/config.toml --set toolkit.env[1].name=CONTAINERD_SOCKET --set toolkit.env[1].value=/run/k3s/containerd/containerd.sock --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS --set toolkit.env[2].value=nvidia --set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT --set-string toolkit.env[3].value=true nvidia/gpu-operator
Error: INSTALLATION FAILED: failed to install CRD crds/nvidia.com_clusterpolicies_crd.yaml: resource mapping not found for name: "clusterpolicies.nvidia.com" namespace: "" from "": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"
ensure CRDs are installed first

Expected behavior:

helm install successfully.

Actual behavior:

helm install fail

@absolutelyZero absolutelyZero changed the title instaill gpu-operator fail install gpu-operator fail Oct 25, 2024
@brandond
Copy link
Member

brandond commented Oct 25, 2024

I don't believe this is an issue with k3s. This looks like a problem with the chart or the values you're passing when installing the chart. Did you want to enable creating the CRDs?

@k3s-io k3s-io locked and limited conversation to collaborators Oct 25, 2024
@brandond brandond converted this issue into discussion #11174 Oct 25, 2024
@github-project-automation github-project-automation bot moved this from New to Done Issue in K3s Development Oct 25, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

2 participants