Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: milvus-querycoord and milvus-proxy milvus-datanode are not able to connect to rootcoord it is going to 127.0.0.1:53100 instead of [milvus-rootcoord]:53100 #34260

Closed
1 task done
milvus-user opened this issue Jun 28, 2024 · 11 comments
Assignees
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@milvus-user
Copy link

milvus-user commented Jun 28, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.4.4
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 4 cpu and 8Gi memory
- GPU: 
- Others:

Current Behavior

milvus-querycoord/milvus-proxy/milvus-datanode it is going to 127.0.0.1:53100 instead of [milvus-rootcoord]:53100

Expected Behavior

milvus-querycoord/milvus-proxy/milvus-datanode it is going to 127.0.0.1:53100
It have to go on below adress
[milvus-rootcoord]:53100

because if milvus-querycoord/milvus-proxy/milvus-datanode will try to connect the root-coord inside their pod by localhost that is giving the error

Steps To Reproduce

We can deploy the Milvus v2.4.4 and will get the error by default helm by passing values required in values.yaml/

Milvus Log

[2024/06/28 05:27:26.834 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[milvus-etcd:2379]"] [minVersion=1.3]
[2024/06/28 05:27:26.837 +00:00] [DEBUG] [querycoord/service.go:218] [network] [port=19531]
[2024/06/28 05:27:26.938 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[milvus-etcd:2379]"] [minVersion=1.3]
[2024/06/28 05:27:26.941 +00:00] [DEBUG] [sessionutil/session_util.go:257] ["Session try to connect to etcd"]
[2024/06/28 05:27:26.942 +00:00] [DEBUG] [sessionutil/session_util.go:272] ["Session connect to etcd success"]
[2024/06/28 05:27:26.943 +00:00] [DEBUG] [querycoord/service.go:168] ["QueryCoord try to wait for RootCoord ready"]
[2024/06/28 05:27:26.944 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100]
[2024/06/28 05:27:26.944 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146]
[2024/06/28 05:27:26.945 +00:00] [WARN] [grpcclient/client.go:554] ["fail to get grpc client"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:26.945 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:26.946 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100]
[2024/06/28 05:27:26.946 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146]
[2024/06/28 05:27:26.947 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:26.947 +00:00] [WARN] [grpcclient/client.go:467] ["retry func failed"] [retried=0] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:27.038 +00:00] [DEBUG] [querycoordv2/server.go:584] ["QueryCoord current state"] [StateCode=Abnormal]
[2024/06/28 05:27:27.148 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100,
reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:27.149 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100]
[2024/06/28 05:27:27.149 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146]
[2024/06/28 05:27:27.150 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc =
"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:27.551 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100,
reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:27.552 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100]
[2024/06/28 05:27:27.552 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146]
[2024/06/28 05:27:27.553 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc =
"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:28.354 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100,
reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:28.355 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100]
[2024/06/28 05:27:28.355 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146]
[2024/06/28 05:27:28.356 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc =
"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:29.957 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100,
reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:29.958 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100]
[2024/06/28 05:27:29.958 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146]
[2024/06/28 05:27:29.959 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc =
"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
:

##################

[2024/06/28 05:27:24.391 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100]
[2024/06/28 05:27:24.391 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146]
[2024/06/28 05:27:24.392 +00:00] [WARN] [grpcclient/client.go:554] ["fail to get grpc client"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:24.392 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:24.393 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100]
[2024/06/28 05:27:24.393 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146]
[2024/06/28 05:27:24.394 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:24.394 +00:00] [WARN] [grpcclient/client.go:467] ["retry func failed"] [retried=0] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error
while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:24.595 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100,
reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:24.596 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100]
[2024/06/28 05:27:24.596 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146]
[2024/06/28 05:27:24.597 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc =
"transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:24.998 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100,
reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/06/28 05:27:25.000 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100]

Anything else?

It is working with version v2.3

############## Config.tpl is generating the expected result
and there is no issue in helm chart.

kafka:
  brokerList: milvus-kafka:9092

rootCoord:
  address: milvus-rootcoord
  port: 53100
  enableActiveStandby: false  # Enable rootcoord active-standby

proxy:
  port: 19530
  internalPort: 19529

queryCoord:
  address: milvus-querycoord
  port: 19531

  enableActiveStandby: false  # Enable querycoord active-standby

queryNode:
  port: 21123
  enableDisk: true # Enable querynode load disk index, and search on disk index

indexCoord:
  address: milvus-indexcoord
  port: 31000
  enableActiveStandby: false  # Enable indexcoord active-standby

indexNode:
  port: 21121
  enableDisk: true # Enable index node build disk vector index

dataCoord:
  address: milvus-datacoord
  port: 13333
  enableActiveStandby: false  # Enable datacoord active-standby

dataNode:
  port: 21124

log:
  level: info
  file:
    rootPath: ""
    maxSize: 300
    maxAge: 10
    maxBackups: 20
@milvus-user milvus-user added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 28, 2024
@yanliang567
Copy link
Contributor

/assign @LoveEachDay
/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 28, 2024
@xiaofan-luan
Copy link
Collaborator

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.4.4
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 4 cpu and 8Gi memory
- GPU: 
- Others:

Current Behavior

milvus-querycoord/milvus-proxy/milvus-datanode it is going to 127.0.0.1:53100 instead of [milvus-rootcoord]:53100

Expected Behavior

milvus-querycoord/milvus-proxy/milvus-datanode it is going to 127.0.0.1:53100 It have to go on below adress [milvus-rootcoord]:53100

because if milvus-querycoord/milvus-proxy/milvus-datanode will try to connect the root-coord inside their pod by localhost that is giving the error

Steps To Reproduce

We can deploy the Milvus v2.4.4 and will get the error by default helm by passing values required in values.yaml/

Milvus Log

[2024/06/28 05:27:26.834 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[milvus-etcd:2379]"] [minVersion=1.3] [2024/06/28 05:27:26.837 +00:00] [DEBUG] [querycoord/service.go:218] [network] [port=19531] [2024/06/28 05:27:26.938 +00:00] [INFO] [etcd/etcd_util.go:49] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[milvus-etcd:2379]"] [minVersion=1.3] [2024/06/28 05:27:26.941 +00:00] [DEBUG] [sessionutil/session_util.go:257] ["Session try to connect to etcd"] [2024/06/28 05:27:26.942 +00:00] [DEBUG] [sessionutil/session_util.go:272] ["Session connect to etcd success"] [2024/06/28 05:27:26.943 +00:00] [DEBUG] [querycoord/service.go:168] ["QueryCoord try to wait for RootCoord ready"] [2024/06/28 05:27:26.944 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:26.944 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:26.945 +00:00] [WARN] [grpcclient/client.go:554] ["fail to get grpc client"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:26.945 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:26.946 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:26.946 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:26.947 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:26.947 +00:00] [WARN] [grpcclient/client.go:467] ["retry func failed"] [retried=0] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:27.038 +00:00] [DEBUG] [querycoordv2/server.go:584] ["QueryCoord current state"] [StateCode=Abnormal] [2024/06/28 05:27:27.148 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:27.149 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:27.149 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:27.150 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:27.551 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:27.552 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:27.552 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:27.553 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:28.354 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:28.355 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:28.355 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:28.356 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:29.957 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:29.958 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:29.958 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:29.959 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] :

##################

[2024/06/28 05:27:24.391 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:24.391 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:24.392 +00:00] [WARN] [grpcclient/client.go:554] ["fail to get grpc client"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:24.392 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:24.393 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:24.393 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:24.394 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:24.394 +00:00] [WARN] [grpcclient/client.go:467] ["retry func failed"] [retried=0] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:24.595 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:24.596 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100] [2024/06/28 05:27:24.596 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=1146] [2024/06/28 05:27:24.597 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:24.998 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""] [2024/06/28 05:27:25.000 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100]

Anything else?

It is working with version v2.3

############## Config.tpl is generating the expected result and there is no issue in helm chart.

kafka:
  brokerList: milvus-kafka:9092

rootCoord:
  address: milvus-rootcoord
  port: 53100
  enableActiveStandby: false  # Enable rootcoord active-standby

proxy:
  port: 19530
  internalPort: 19529

queryCoord:
  address: milvus-querycoord
  port: 19531

  enableActiveStandby: false  # Enable querycoord active-standby

queryNode:
  port: 21123
  enableDisk: true # Enable querynode load disk index, and search on disk index

indexCoord:
  address: milvus-indexcoord
  port: 31000
  enableActiveStandby: false  # Enable indexcoord active-standby

indexNode:
  port: 21121
  enableDisk: true # Enable index node build disk vector index

dataCoord:
  address: milvus-datacoord
  port: 13333
  enableActiveStandby: false  # Enable datacoord active-standby

dataNode:
  port: 21124

log:
  level: info
  file:
    rootPath: ""
    maxSize: 300
    maxAge: 10
    maxBackups: 20

from the log, root coord register 127.0.0.1 as it's address address it get by

hostName, hostNameErr := os.Hostname()
if hostNameErr != nil {
	log.Error("get host name fail", zap.Error(hostNameErr))
}

session := &Session{
	ctx:      ctx,
	metaRoot: metaRoot,
	Version:  common.Version,

	SessionRaw: SessionRaw{
		HostName: hostName,
	},

	// options
	sessionTTL:        paramtable.Get().CommonCfg.SessionTTL.GetAsInt64(),
	sessionRetryTimes: paramtable.Get().CommonCfg.SessionRetryTimes.GetAsInt64(),
	reuseNodeID:       true,
	isStopped:         *atomic.NewBool(false),
}

so, most likely you are deploy in our official docker. You can use ifconfig check the network setting

@milvus-user
Copy link
Author

yes we had deployed the official docker

@milvus-user
Copy link
Author

ifconfig result....
ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet6 xxxb:cxxx:xxx:xxxx:0:x:x:x prefixlen 64 scopeid 0x0
inet6 fxxx::xxxx:xxx:xxxx:xxxx prefixlen 64 scopeid 0x20
ether xx:xx:xx:xx:xx:xx txqueuelen 1000 (Ethernet)
RX packets 31133598 bytes 76781057453 (71.5 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 16900577 bytes 59431116180 (55.3 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10
loop txqueuelen 1000 (Local Loopback)
RX packets 89192 bytes 957410077 (913.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 89192 bytes 957410077 (913.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255
ether 52:54:00:68:16:54 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

@xiaofan-luan
Copy link
Collaborator

The reason os.Hostname is seeing 127.0.0.1 rather than 192.168.122.1 is that 127.0.0.1 is the address associated with the loopback interface (lo), which is the default address for the local host. The 192.168.122.1 address is associated with a virtual bridge interface (virbr0), which is typically used for virtual networking, such as with virtual machines or containers.

Here’s a breakdown of what’s happening:

Loopback Interface (lo):
inet 127.0.0.1 is the loopback address.
This address is used by the system to refer to itself. It’s always present and is the default address for hostname resolution on the local machine.
Virtual Bridge Interface (virbr0):
inet 192.168.122.1 is the IP address assigned to the virtual bridge.
This address is typically used for networking between virtualized environments and the host system.
Ethernet Interface (eth0):
No IPv4 address is provided, only IPv6 addresses are shown.
When you query the hostname, by default, it resolves to the loopback address (127.0.0.1). This is because the hostname resolution on most systems is configured to resolve the hostname to the loopback address unless explicitly configured otherwise.

If you want the hostname to resolve to 192.168.122.1, you would need to modify your system’s network configuration. Here’s how you can adjust this on a Linux system:

Edit /etc/hosts:
Add a line to associate your hostname with 192.168.122.1.

192.168.122.1 your-hostname
Ensure Network Configuration:
Make sure that virbr0 is correctly configured to be up and running with the desired IP address.
Restart Networking:
Restart the network service to apply the changes.

sudo systemctl restart networking
Keep in mind that changing the hostname resolution might affect your system's networking behavior, especially if virbr0 is not always active or if the IP address might change.

For programmatic access, you might need to explicitly query the IP address of the virbr0 interface instead of relying on os.Hostname. This can be done using various libraries or system calls to retrieve the IP address of a specific interface.

@xiaofan-luan
Copy link
Collaborator

this is what i got from GPT and hopefully that could help

@milvus-user
Copy link
Author

the ifconfig result.. that i shared it was for rootcoord only so for itself it was using the 127.0.0.1 address

@milvus-user
Copy link
Author

apart from this we had tried with milvus v2.3.13 and that is running fine with same config no change apart from image

@milvus-user
Copy link
Author

reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/07/02 06:45:12.553 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100]
[2024/07/02 06:45:12.553 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=23]
[2024/07/02 06:45:12.554 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/07/02 06:45:12.554 +00:00] [WARN] [grpcclient/client.go:467] ["retry func failed"] [retried=8] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/07/02 06:45:14.814 +00:00] [DEBUG] [config/refresher.go:71] ["etcd refreshConfigurations"] [prefix=by-dev/config] [endpoints="[milvus-etcd:2379]"]
[2024/07/02 06:45:19.814 +00:00] [DEBUG] [config/refresher.go:71] ["etcd refreshConfigurations"] [prefix=by-dev/config] [endpoints="[milvus-etcd:2379]"]
[2024/07/02 06:45:22.555 +00:00] [WARN] [retry/retry.go:104] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/07/02 06:45:22.556 +00:00] [DEBUG] [sessionutil/session_util.go:620] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=127.0.0.1:53100]
[2024/07/02 06:45:22.556 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=127.0.0.1:53100] [serverID=23]
[2024/07/02 06:45:22.557 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 127.0.0.1:53100, reason: connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:53100: connect: connection refused""]
[2024/07/02 06:45:24.813 +00:00] [DEBUG] [config/refresher.go:71] ["etcd refreshConfigurations"] [prefix=by-dev/config] [endpoints="[milvus-etcd:2379]"]

@xiaofan-luan
Copy link
Collaborator

apart from this we had tried with milvus v2.3.13 and that is running fine with same config no change apart from image

how did you install milvus?

this logic has not been changed since 2.3.4

So this is definitely not a bug but more of a env issue

Copy link

stale bot commented Aug 4, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Aug 4, 2024
@stale stale bot closed this as completed Aug 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants