Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to run JobSet 0.7.1 (latest for now) with GKE CSI Driver #730

Closed
raj-prince opened this issue Dec 10, 2024 · 1 comment
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@raj-prince
Copy link

raj-prince commented Dec 10, 2024

What happened:
Latest JobSet (0.7.1) doesn't work with GKE CSI Driver. Tried an old version (0.5.2) and it worked well.

What you expected to happen:
Latest JobSet (0.7.1) should work with GKE CSI Driver.

How to reproduce it (as minimally and precisely as possible):
JobSet.yaml

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: test-jobset
  annotations:
    alpha.jobset.sigs.k8s.io/exclusive-topology: cloud.google.com/gke-nodepool
spec:
  failurePolicy:
    maxRestarts: 0
  replicatedJobs:
  - name: main
    replicas: 1
    template:
      spec:
        parallelism: 1 # Should be smaller than the number of VMs
        completions: 1 # Same as the above.
        backoffLimit: 0   # When any pod fails, the job is failed
        template:
          metadata:
            labels:
              tessellations.google.com/workload: training-microbenchmark
            annotations:
              gke-gcsfuse/volumes: "true"
              gke-gcsfuse/cpu-limit: "0"
              gke-gcsfuse/memory-limit: "0"
              gke-gcsfuse/ephemeral-storage-limit: "0"

          spec:
            containers:
            - name: benchmark
              image: busybox
              command: [ "sleep" ]
              args: [ "infinity" ]
              volumeMounts:
              - mountPath: /mnt/benchmark-output
                name: gcsfuse-outputs
                readOnly: false
            serviceAccountName: <kubernetes_sa>
            volumes:
            - name: gcsfuse-outputs
              csi:
                driver: gcsfuse.csi.storage.gke.io
                volumeAttributes:
                  bucketName: <bucket_name>

Tried running the above jobSet.yaml in a cluster installed JobSet-0.7.1 - Command to install:

kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.7.1/manifests.yaml

This worked, when I tried installing JobSet-0.5.2 - Command to install (first delete the JobSet-0.7.1, then install JobSet-0.5.2):

# Deleted the v0.7.1 jobset from the cluster
kubectl delete -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.7.1/manifests.yaml

# Installed the v0.5.2 jobset
kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.5.2/manifests.yaml

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
Client Version: v1.30.6-dispatcher
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.5-gke.1699000
  • JobSet version (use git describe --tags --dirty --always): latest 0.7.1
  • Cloud provider or hardware configuration: GCP
  • Install tools:
  • Others:
@raj-prince raj-prince added the kind/bug Categorizes issue or PR as related to a bug. label Dec 10, 2024
@raj-prince
Copy link
Author

Duplicate of this issue - #729

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant