-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KubeCluster with service type LoadBalancer on AKS stuck in "Pending" state on startup #906
Comments
Thanks for raising this. Internally >>> import box
>>> data_in = {"loadBalancer": True}
>>> data = box.Box(data_in, camel_killer_box=True) You can access the >>> data.load_balancer
True
>>> data.loadBalancer
True Unfortunately this operation in >>> data.to_dict()
{'load_balancer': True} Given that It looks like this bit of code where we look up the load balancer info is still using the snake case and needs updating to camel case. |
I've opened up #907 to resolve this here. @blankhold could you confirm that these are the changes you made that fixed things for you? It's a little hard to test this fix as we can't provision loadbalancers easily in our test suite. |
Awesome, thanks for the explanation and PR @jacobtomlinson! Seems like there are a few more patches I had to make. I pulled in the changes from your PR and also the changes you recently made in kr8s kr8s-org/kr8s@665ff53. Some other things:
With those changes above, I was able to get a daskcluster with LoadBalancer service and access the dashboard externally |
Ok great! Perhaps it's easier if you put up a PR that supersedes mine that makes the changes you need and we can discuss specifics there. |
Thanks! Created #908 which I think should fix the issues. Also, know when kr8s can make a new release to pick up your recent changes when we release a new version of dask-kubernetes? Or how will the release go so that we can finally pick up these changes once merged? |
Didn't realize how to test the controller before, so my getting stuck in wait_for_controller commend might not be applicable/might just be fixed when rebuilding the controller to test. Will let you know if that worked |
I've just released |
Describe the issue:
I'm trying to create a KubeCluster with AKS LoadBalancer, but the DaskCluster is stuck in a pending state at startup.
Minimal Complete Verifiable Example:
Trying to create a cluster like this:
but the DaskCluster is stuck in Pending State until it times out
Anything else we need to know?:
I looked around at our service spec and it seems like our resource definition has our load balancer under a key "loadBalancer" when the scheduler service comes up (as referenced in kubernetes doc https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer)
while dask-kubernetes seems to look around for "load_balancer" in status:
dask-kubernetes/dask_kubernetes/operator/controller/controller.py
Lines 403 to 406 in 2ecfdcd
for what it's worth, the kr8s library also seems to look around for "load_balancer".
When I switched these over to using "loadBalancer" the DaskCluster was able to come up without any issues. Actually even when I didn't change the code, the service, pods, deployments etc were all able to be setup and in a running state, only the DaskCluster was pending.
Any helps is great, thanks!
Environment:
The text was updated successfully, but these errors were encountered: