Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: rootcoord runtime error,querycoord cannot running #35174

Closed
1 task done
syang1997 opened this issue Aug 1, 2024 · 5 comments
Closed
1 task done

[Bug]: rootcoord runtime error,querycoord cannot running #35174

syang1997 opened this issue Aug 1, 2024 · 5 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@syang1997
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:  2.3.15 and 2.3.20
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): not
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Querycoor cannot be connected to rootcoord, rootcoord runs an error

2024-08-01T15:41:07.366245482+08:00 [2024/08/01 07:41:07.366 +00:00] [WARN] [retry/retry.go:102] ["retry func failed"] [retried=0] [error="empty grpc client: find no available querycoord, check querycoord state"] [errorVerbose="empty grpc client: find no available querycoord, check querycoord state\n(1) attached stack trace\n  -- stack trace:\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).call.func2\n  | \t/workspace/source/internal/util/grpcclient/client.go:474\n  | github.com/milvus-io/milvus/pkg/util/retry.Handle\n  | \t/workspace/source/pkg/util/retry/retry.go:100\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).call\n  | \t/workspace/source/internal/util/grpcclient/client.go:467\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call\n  | \t/workspace/source/internal/util/grpcclient/client.go:551\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n  | \t/workspace/source/internal/util/grpcclient/client.go:567\n  | github.com/milvus-io/milvus/internal/distributed/querycoord/client.wrapGrpcCall[...]\n  | \t/workspace/source/internal/distributed/querycoord/client/client.go:95\n  | github.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).GetMetrics\n  | \t/workspace/source/internal/distributed/querycoord/client/client.go:267\n  | github.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func1\n  | \t/workspace/source/internal/rootcoord/quota_center.go:253\n  | golang.org/x/sync/errgroup.(*Group).Go.func1\n  | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n  | runtime.goexit\n  | \t/usr/local/go/src/runtime/asm_amd64.s:1650\nWraps: (2) empty grpc client\nWraps: (3) find no available querycoord, check querycoord state\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString"]
2024-08-01T15:41:07.567041526+08:00 [2024/08/01 07:41:07.566 +00:00] [WARN] [grpcclient/client.go:475] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=querycoord] [error="empty grpc client: find no available querycoord, check querycoord state"] [errorVerbose="empty grpc client: find no available querycoord, check querycoord state\n(1) attached stack trace\n  -- stack trace:\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).call.func2\n  | \t/workspace/source/internal/util/grpcclient/client.go:474\n  | github.com/milvus-io/milvus/pkg/util/retry.Handle\n  | \t/workspace/source/pkg/util/retry/retry.go:100\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).call\n  | \t/workspace/source/internal/util/grpcclient/client.go:467\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call\n  | \t/workspace/source/internal/util/grpcclient/client.go:551\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n  | \t/workspace/source/internal/util/grpcclient/client.go:567\n  | github.com/milvus-io/milvus/internal/distributed/querycoord/client.wrapGrpcCall[...]\n  | \t/workspace/source/internal/distributed/querycoord/client/client.go:95\n  | github.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).GetMetrics\n  | \t/workspace/source/internal/distributed/querycoord/client/client.go:267\n  | github.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func1\n  | \t/workspace/source/internal/rootcoord/quota_center.go:253\n  | golang.org/x/sync/errgroup.(*Group).Go.func1\n  | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n  | runtime.goexit\n  | \t/usr/local/go/src/runtime/asm_amd64.s:1650\nWraps: (2) empty grpc client\nWraps: (3) find no available querycoord, check querycoord state\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString"]
2024-08-01T15:41:07.568249420+08:00 [2024/08/01 07:41:07.568 +00:00] [WARN] [grpcclient/client.go:249] ["failed to get client address"] [error="find no available querycoord, check querycoord state"]
2024-08-01T15:41:07.568256588+08:00 [2024/08/01 07:41:07.568 +00:00] [WARN] [grpcclient/client.go:461] ["fail to get grpc client in the retry state"] [client_role=querycoord] [error="find no available querycoord, check querycoord state"]
2024-08-01T15:41:07.969055809+08:00 [2024/08/01 07:41:07.968 +00:00] [WARN] [grpcclient/client.go:475] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=querycoord] [error="empty grpc client: find no available querycoord, check querycoord state"] [errorVerbose="empty grpc client: find no available querycoord, check querycoord state\n(1) attached stack trace\n  -- stack trace:\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).call.func2\n  | \t/workspace/source/internal/util/grpcclient/client.go:474\n  | github.com/milvus-io/milvus/pkg/util/retry.Handle\n  | \t/workspace/source/pkg/util/retry/retry.go:100\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).call\n  | \t/workspace/source/internal/util/grpcclient/client.go:467\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call\n  | \t/workspace/source/internal/util/grpcclient/client.go:551\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n  | \t/workspace/source/internal/util/grpcclient/client.go:567\n  | github.com/milvus-io/milvus/internal/distributed/querycoord/client.wrapGrpcCall[...]\n  | \t/workspace/source/internal/distributed/querycoord/client/client.go:95\n  | github.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).GetMetrics\n  | \t/workspace/source/internal/distributed/querycoord/client/client.go:267\n  | github.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func1\n  | \t/workspace/source/internal/rootcoord/quota_center.go:253\n  | golang.org/x/sync/errgroup.(*Group).Go.func1\n  | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n  | runtime.goexit\n  | \t/usr/local/go/src/runtime/asm_amd64.s:1650\nWraps: (2) empty grpc client\nWraps: (3) find no available querycoord, check querycoord state\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString"]

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

@syang1997 syang1997 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 1, 2024
@syang1997 syang1997 changed the title [Bug]: rootcoord runtime error,querycoord canot running [Bug]: rootcoord runtime error,querycoord cannot running Aug 1, 2024
@syang1997
Copy link
Author

@yanliang567 Can you help me check?

@syang1997
Copy link
Author

proxy error log

2024-08-01T15:49:19.446980238+08:00 [2024/08/01 07:49:19.446 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 10.221.20.19:53100, reason: context deadline exceeded: connection error: desc = \"transport: error while dialing: dial tcp 10.221.20.19:53100: i/o timeout\""]
2024-08-01T15:49:29.331313115+08:00 [2024/08/01 07:49:29.330 +00:00] [DEBUG] [config/refresher.go:70] ["etcd refreshConfigurations"] [prefix=aks-us-c-recplt-1-b/config] [endpoints="[aks-us-c-recplt-1-b-etcd.shein-paas-component:2379]"]
2024-08-01T15:49:29.447913734+08:00 [2024/08/01 07:49:29.447 +00:00] [WARN] [retry/retry.go:100] ["grpc client is nil, maybe fail to get client in the retry state"] [client_role=rootcoord] [error="empty grpc client: failed to connect 10.221.20.19:53100, reason: context deadline exceeded: connection error: desc = \"transport: error while dialing: dial tcp 10.221.20.19:53100: i/o timeout\""]
2024-08-01T15:49:29.450016602+08:00 [2024/08/01 07:49:29.449 +00:00] [DEBUG] [sessionutil/session_util.go:618] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord] [address=10.221.20.19:53100]
2024-08-01T15:49:29.450021312+08:00 [2024/08/01 07:49:29.449 +00:00] [DEBUG] [sessionutil/session_util.go:618] ["SessionUtil GetSessions"] [prefix=rootcoord] [key=rootcoord-10] [address=10.221.20.19:53100]
2024-08-01T15:49:29.450033595+08:00 [2024/08/01 07:49:29.449 +00:00] [DEBUG] [client/client.go:93] ["RootCoordClient GetSessions success"] [address=10.221.20.19:53100] [serverID=10]
2024-08-01T15:49:29.651077075+08:00 [2024/08/01 07:49:29.650 +00:00] [WARN] [grpcclient/client.go:476] ["fail to get grpc client in the retry state"] [client_role=rootcoord] [error="failed to connect 10.221.20.19:53100, reason: context deadline exceeded: connection error: desc = \"transport: error while dialing: dial tcp 10.221.20.19:53100: i/o timeout\""]

@binbinlv
Copy link
Contributor

binbinlv commented Aug 1, 2024

@syang1997
Could you please refer this doc to export the whole Milvus logs for investigation?
Thanks.

@binbinlv binbinlv added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 1, 2024
@syang1997
Copy link
Author

@syang1997 Could you please refer this doc to export the whole Milvus logs for investigation? Thanks.

I have no machine permissions in this environment.Unable to use a log export script

@xiaofan-luan
Copy link
Collaborator

did this problem solved?
It seems that there is a network connectivity issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

4 participants