Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask can not connect to scheduler using NodePort when node has multiple addresses #806

Closed
dbalabka opened this issue Aug 25, 2023 · 5 comments

Comments

@dbalabka
Copy link
Contributor

dbalabka commented Aug 25, 2023

Describe the issue:

If we try to create a scheduler with service type NodePort, dask will try to connecting to scheduler forever:

╭─────────────────── Creating KubeCluster 'dmitryb-cluster' ────────────────────╮
│                                                                              │
│   DaskCluster                                                      Running   │
│   Scheduler Pod                                                    Running   │
│   Scheduler Service                                                Created   │
│   Default Worker Group                                             Created   │
│                                                                              │
│ ⠼ Connecting to scheduler                                                    │
╰──────────────────────────────────────────────────────────────────────────────╯

It seems that get_external_address_for_scheduler_service returns internal port tcp://172.16.18.191:8786, instead of which is being mapped in services:

❯ kubectl get services
NAME                       TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)                         AGE
dmitryb-cluster-scheduler   NodePort   10.105.154.53   <none>        8786:32167/TCP,8787:30729/TCP   8m45s

return f"tcp://{host}:{port}"

As a solution, we have to use node_port instead of port in case of using NodePort service:

So, the correct connection URI should be: tcp://172.16.18.191:32167

Minimal Complete Verifiable Example:

from dask_kubernetes.operator import KubeCluster, make_cluster_spec

spec = make_cluster_spec(
    name="dmitryb-cluser", 
    image='ghcr.io/dask/dask:2023.8.1-py3.10',
    n_workers=1, 
    scheduler_service_type="NodePort",
)
 
cluster = KubeCluster(
    namespace="dask-operator", 
    custom_cluster_spec=spec, 
)

Environment:

dask-kubernetes = "~2023.8.0"
dask = "~2023.8.0"
@jacobtomlinson
Copy link
Member

jacobtomlinson commented Aug 25, 2023

I expect the bug is on this line. If your node has two addresses only the first is returned. We will need some way to figure out which is public and which is private.

host = nodes.items[0].status.addresses[0].address

@jacobtomlinson jacobtomlinson changed the title Dask can not connect to scheduler if using NodePort Dask can not connect to scheduler using NodePort when node has multiple addresses Aug 25, 2023
@dbalabka
Copy link
Contributor Author

dbalabka commented Aug 25, 2023

@jacobtomlinson , in my case, the IP is correct. Using the node_port number solves the issue.
So, the correct connection URI should be: tcp://172.16.18.191:32167

We have only one IP address per node. It's not a problem. Here is an example of the API output:

  status:
    addresses:
    - address: 172.16.18.192
      type: InternalIP
    - address: k8s-master-2.internal.loc
      type: Hostname

However, I agree that filtering these addresses by type is a better approach instead of taking simply the first.

@dbalabka
Copy link
Contributor Author

@jacobtomlinson, Just to clarify, we do have access to internal IP addresses. Therefore, if you plan on filtering IP addresses by type, please allow for the option to choose internal IP addresses as well.

dbalabka added a commit to dbalabka/dask-kubernetes that referenced this issue Aug 29, 2023
@dbalabka
Copy link
Contributor Author

dbalabka commented Aug 29, 2023

@jacobtomlinson I've prepared a PR, that shows how I think it should be fixed #808

jacobtomlinson pushed a commit that referenced this issue Sep 4, 2023
* Fix port number when use NodePort (#806)

* Fix up
@jacobtomlinson
Copy link
Member

Closed by #808

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants