You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #4482 and #4576, we put each hub on the openscapes and nasa-veda hubs into their own nodegroups for cost allocation purposes. They are both dask-enabled, so there are dask nodegroups per hub as well.
I don't believe the nodeSelectors to place the dask workers into the associated nodegroups are working. Here is a video where I create a dask worker on the the openscapes staging hub, but it is scheduled to the dask-workshop nodegroup, rather than the dask-staging nodegroup:
Thank you for recording a video @sgibson91!!! ❤️ 🎉
From the video I see that the dask-worker created only got a node selector of k8s.dask.org/node-purpose=worker - a great starting point to investigate from!
The issue was that dask-staging wasn't configured under basehub.dask-staging for all daskhub charts which are now installing dask-gateway as a dependency in basehub as compared to installing it as a depedency of the daskhub chart itself.
Adjusting the intendation of config to go under basehub.dask-staging, the issue is resolved:
kubectl get pod -n staging -o yaml dask-worker-2493c9f4dc3f4bb99b7c893693b2e02b-c2dkl | grep -A10 nodeSe
nodeSelector:
2i2c/hub-name: staging
k8s.dask.org/node-purpose: worker
# ...
kubectl get pod -n staging -o yaml dask-scheduler-2493c9f4dc3f4bb99b7c893693b2e02b | grep -A10 nodeSe
nodeSelector:
2i2c/hub-name: staging
k8s.dask.org/node-purpose: scheduler
# ...
In #4482 and #4576, we put each hub on the openscapes and nasa-veda hubs into their own nodegroups for cost allocation purposes. They are both dask-enabled, so there are dask nodegroups per hub as well.
I don't believe the nodeSelectors to place the dask workers into the associated nodegroups are working. Here is a video where I create a dask worker on the the openscapes staging hub, but it is scheduled to the dask-workshop nodegroup, rather than the dask-staging nodegroup:
video2293414048.mp4
Here is the relevant config:
infrastructure/config/clusters/openscapes/staging.values.yaml
Lines 27 to 37 in 45ec02f
Definition of Done
We understand why dask workers (and schedulers) are not being scheduled on the associated hub-specific nodegroup.
The text was updated successfully, but these errors were encountered: