Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spike: 1hr] Investigate why nodeSelectors are not being respected for dask #4600

Closed
sgibson91 opened this issue Aug 9, 2024 · 3 comments · Fixed by #4662
Closed

[Spike: 1hr] Investigate why nodeSelectors are not being respected for dask #4600

sgibson91 opened this issue Aug 9, 2024 · 3 comments · Fixed by #4662
Assignees

Comments

@sgibson91
Copy link
Member

sgibson91 commented Aug 9, 2024

In #4482 and #4576, we put each hub on the openscapes and nasa-veda hubs into their own nodegroups for cost allocation purposes. They are both dask-enabled, so there are dask nodegroups per hub as well.

I don't believe the nodeSelectors to place the dask workers into the associated nodegroups are working. Here is a video where I create a dask worker on the the openscapes staging hub, but it is scheduled to the dask-workshop nodegroup, rather than the dask-staging nodegroup:

video2293414048.mp4

Here is the relevant config:

dask-gateway:
gateway:
backend:
scheduler:
extraPodConfig:
nodeSelector:
2i2c/hub-name: staging
worker:
extraPodConfig:
nodeSelector:
2i2c/hub-name: staging

Definition of Done

We understand why dask workers (and schedulers) are not being scheduled on the associated hub-specific nodegroup.

@consideRatio consideRatio self-assigned this Aug 22, 2024
@consideRatio
Copy link
Contributor

I'll pick this up before #4648 because I'm having ~1 hour left of my day and prefer to pick the other thing up tomorrow morning instead.

@consideRatio
Copy link
Contributor

Thank you for recording a video @sgibson91!!! ❤️ 🎉

From the video I see that the dask-worker created only got a node selector of k8s.dask.org/node-purpose=worker - a great starting point to investigate from!

@consideRatio
Copy link
Contributor

The issue was that dask-staging wasn't configured under basehub.dask-staging for all daskhub charts which are now installing dask-gateway as a dependency in basehub as compared to installing it as a depedency of the daskhub chart itself.

Adjusting the intendation of config to go under basehub.dask-staging, the issue is resolved:

kubectl get pod -n staging -o yaml dask-worker-2493c9f4dc3f4bb99b7c893693b2e02b-c2dkl | grep -A10 nodeSe
  nodeSelector:
    2i2c/hub-name: staging
    k8s.dask.org/node-purpose: worker
  # ...

kubectl get pod -n staging -o yaml dask-scheduler-2493c9f4dc3f4bb99b7c893693b2e02b | grep -A10 nodeSe
  nodeSelector:
    2i2c/hub-name: staging
    k8s.dask.org/node-purpose: scheduler
  # ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants