Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to get channels, collection not loaded #27485

Closed
1 task done
sadwsh opened this issue Oct 6, 2023 · 17 comments
Closed
1 task done

failed to get channels, collection not loaded #27485

sadwsh opened this issue Oct 6, 2023 · 17 comments
Assignees
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@sadwsh
Copy link

sadwsh commented Oct 6, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.3.1
- Deployment mode(standalone or cluster): standalone
- SDK version(e.g. pymilvus v2.0.0rc2): milvus-sdk-node 2.3.1
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

I have a standalone installation of Milvus with Docker Compose. Using the Attu tool I can easily connect, create a collection and populate it with the sample data. Collection data also gets loaded and I can perform queries without any issues.
However, in my production environment with standalone installation using Kubernetes cluster (with Helm) the same collection fails to load after creation and after some time I get a timeout notice.

The initial error message reads:
"failed to get channels, collection not loaded: collection=444710574464845978: collection not found"

This is somehow weird, as the collection is populated with data and is displayed in attu.
Not sure, whether this is related to some memory issues but loading a collection with only 100 entities shouldn't be of a big deal!

Expected Behavior

The collection should be loaded into memory.

Steps To Reproduce

No response

Milvus Log

[2023/10/06 12:29:00.212 +00:00] [INFO] [querycoordv2/services.go:787] ["get replicas request received"] [traceID=438b2f70d1034f2a08e6845efca866ae] [collectionID=444710574464845978] [with-shard-nodes=false]
2023-10-06T14:29:00.212777686+02:00 [2023/10/06 12:29:00.212 +00:00] [WARN] [querycoordv2/handlers.go:321] ["failed to get channels, collection not loaded"]
[2023/10/06 12:29:00.212 +00:00] [WARN] [querycoordv2/services.go:824] ["failed to get replica info"] [traceID=438b2f70d1034f2a08e6845efca866ae] [collectionID=444710574464845978] [replica=444710575524806673] [error="failed to get channels, collection not loaded: collection=444710574464845978: collection not found"] [errorVerbose="failed to get channels, collection not loaded: collection=444710574464845978: collection not found\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/pkg/util/merr.WrapErrCollectionNotFound\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/merr/utils.go:244\n | [...repeated from below...]\nWraps: (2) failed to get channels, collection not loaded\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/pkg/util/merr.wrapWithField\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/merr/utils.go:531\n | github.com/milvus-io/milvus/pkg/util/merr.WrapErrCollectionNotFound\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/merr/utils.go:242\n | github.com/milvus-io/milvus/internal/querycoordv2.(*Server).fillReplicaInfo\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/handlers.go:322\n | github.com/milvus-io/milvus/internal/querycoordv2.(*Server).GetReplicas\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/services.go:822\n | github.com/milvus-io/milvus/internal/distributed/querycoord.(*Server).GetReplicas\n | \t/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/service.go:379\n | github.com/milvus-io/milvus/internal/proto/querypb._QueryCoord_GetReplicas_Handler.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proto/querypb/query_coord.pb.go:5537\n | github.com/milvus-io/milvus/pkg/util/interceptor.ServerIDValidationUnaryServerInterceptor.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/interceptor/server_id_interceptor.go:54\n | github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n | \t/go/pkg/mod/github.
com/grpc-ecosystem/[email protected]/chain.go:25\n | github.com/milvus-io/milvus/pkg/util/interceptor.ClusterValidationUnaryServerInterceptor.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/interceptor/cluster_interceptor.go:48\n | github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25\n | github.com/milvus-io/milvus/pkg/util/logutil.UnaryTraceLoggerInterceptor\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/logutil/grpc_interceptor.go:23\n | github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25\n | go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1\n | \t/go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/[email protected]/interceptor.go:342\n | github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25\n | github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:34\n | github.com/milvus-io/milvus/internal/proto/querypb._QueryCoord_GetReplicas_Handler\n | \t/go/src/github.com/milvus-io/milvus/internal/proto/querypb/query_coord.pb.go:5539\n | google.golang.org/grpc.(*Server).processUnaryRPC\n | \t/go/pkg/mod/google.golang.org/[email protected]/server.go:1345\n | google.golang.org/grpc.(*Server).handleStream\n | \t/go/pkg/mod/google.golang.org/[email protected]/server.go:1722\n | google.golang.org/grpc.(*Server).serveStreams.func1.2\n | \t/go/pkg/mod/google.golang.org/[email protected]/server.go:966\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (4) collection=444710574464845978\nWraps: (5) collection not found\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4)
*errutil.withPrefix (5) merr.milvusError"]

Anything else?

No response

@sadwsh sadwsh added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 6, 2023
@yanliang567
Copy link
Contributor

@sadwsh Could you please refer this doc to export the whole Milvus logs for investigation?
/assign @sadwsh
/unassign

@sre-ci-robot sre-ci-robot assigned sadwsh and unassigned yanliang567 Oct 7, 2023
@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 7, 2023
@Felix0525
Copy link

Any suggestion? i have get same problems too.

@xiaofan-luan
Copy link
Collaborator

you will need to load colleciton before search.
the common to use milvus is
create collection
build index
load
insert
search

@sadwsh
Copy link
Author

sadwsh commented Oct 8, 2023

you will need to load colleciton before search. the common to use milvus is create collection build index load insert search

@xiaofan-luan The whole point here is that the collection is not being loaded as desired in order to perform search and query operations!

@sadwsh
Copy link
Author

sadwsh commented Oct 9, 2023

@sadwsh Could you please refer this doc to export the whole Milvus logs for investigation? /assign @sadwsh /unassign

@yanliang567 Here are my exported milvus logs containing the collection load error for reference.
milvus-log.tar.gz

@xiaofan-luan
Copy link
Collaborator

image

@xiaofan-luan
Copy link
Collaborator

From the log, we hit the similar problem of move consume position backward.

/assign @bigsheeper
/assign @weiliu1031

@xiaofan-luan
Copy link
Collaborator

@sadwsh can you expalin what operations did you do before you hit into this issue?

@sadwsh
Copy link
Author

sadwsh commented Oct 9, 2023

@sadwsh can you expalin what operations did you do before you hit into this issue?

@xiaofan-luan I basically just created a simple collection with 3 fields and populated it with sample data. This all was done using the Attu tool and kubectl port-forward to connect to the pod from my local machine. Everything works fine except loading the collection into memory, where it gets stuck after 50% progress and eventually hits a timeout.

Here a screenshot of the collection details:
grafik

@xiaofan-luan
Copy link
Collaborator

how many partitions do you have?
From the log you offered the watch dml request has been timeout for a long time(Or never succeed due to some reason but the log din't cover that time)

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed triage/needs-information Indicates an issue needs more information in order to work on it. labels Oct 10, 2023
@yanliang567 yanliang567 added this to the 2.3.2 milestone Oct 10, 2023
@sadwsh
Copy link
Author

sadwsh commented Oct 10, 2023

how many partitions do you have? From the log you offered the watch dml request has been timeout for a long time(Or never succeed due to some reason but the log din't cover that time)

@xiaofan-luan There's only one partition for that collection and that's the default partition i.e. _default.

grafik

@sadwsh
Copy link
Author

sadwsh commented Oct 23, 2023

@yanliang567 @xiaofan-luan
Any estimates on when there's going to be a patch or new release fixing this issue? The milestone for v2.3.2 is long overdue and there's only 40% progress until now. I'm considering other altvernative vector databases as I need to get a production-ready app deployed asap. Any suggestions for a quick workaround?

@yanliang567 yanliang567 modified the milestones: 2.3.2, 2.3.3 Nov 7, 2023
@yanliang567 yanliang567 modified the milestones: 2.3.3, 2.3.4 Nov 16, 2023
Copy link

stale bot commented Dec 16, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Dec 16, 2023
@stale stale bot closed this as completed Dec 23, 2023
@yaliqin
Copy link

yaliqin commented Feb 27, 2024

I met the same issue

@yanliang567
Copy link
Contributor

please try latest milvus v2.3.10, and if it still fail in load, please file a new issue with the milvus logs attached.

@Lowpower
Copy link

upgrade to 3.11 same problem

[2024/03/11 11:08:46.035 +00:00] [WARN] [querycoordv2/handlers.go:319] ["failed to get channels, collection not loaded"]
[2024/03/11 11:08:46.035 +00:00] [WARN] [querycoordv2/services.go:821] ["failed to get replica info"] [traceID=ea100fd12bbb377f7bbb31ede75e9740] [collectionID=446631422104163043] [replica=446631422317035535] [error="failed to get channels, collection not loaded: collection not found[collection=446631422104163043]"] [errorVerbose="failed to get channels, collection not loaded: collection not found[collection=446631422104163043]\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/pkg/util/merr.WrapErrCollectionNotFound\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/merr/utils.go:424\n | github.com/milvus-io/milvus/internal/querycoordv2.(*Server).fillReplicaInfo\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/handlers.go:320\n | github.com/milvus-io/milvus/internal/querycoordv2.(*Server).GetReplicas\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/services.go:819\n | github.com/milvus-io/milvus/internal/distributed/querycoord.(*Server).GetReplicas\n | \t/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/service.go:388\n | github.com/milvus-io/milvus/internal/proto/querypb._QueryCoord_GetReplicas_Handler.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proto/querypb/query_coord.pb.go:5586\n | github.com/milvus-io/milvus/pkg/util/interceptor.ServerIDValidationUnaryServerInterceptor.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/interceptor/server_id_interceptor.go:54\n | github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25\n | github.com/milvus-io/milvus/pkg/util/interceptor.ClusterValidationUnaryServerInterceptor.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/interceptor/cluster_interceptor.go:48\n | github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25\n | github.com/milvus-io/milvus/pkg/util/logutil.UnaryTraceLoggerInterceptor\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/logutil/grpc_interceptor.go:23\n | github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25\n | go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1\n | \t/go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/[email protected]/interceptor.go:342\n | github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25\n | github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:34\n | github.com/milvus-io/milvus/internal/proto/querypb._QueryCoord_GetReplicas_Handler\n | \t/go/src/github.com/milvus-io/milvus/internal/proto/querypb/query_coord.pb.go:5588\n | google.golang.org/grpc.(*Server).processUnaryRPC\n | \t/go/pkg/mod/google.golang.org/[email protected]/server.go:1345\n | google.golang.org/grpc.(*Server).handleStream\n | \t/go/pkg/mod/google.golang.org/[email protected]/server.go:1722\n | google.golang.org/grpc.(*Server).serveStreams.func1.2\n | \t/go/pkg/mod/google.golang.org/[email protected]/server.go:966\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (2) failed to get channels, collection not loaded\nWraps: (3) collection not found[collection=446631422104163043]\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) merr.milvusError"]

@xiaofan-luan
Copy link
Collaborator

upgrade to 3.11 same problem

[2024/03/11 11:08:46.035 +00:00] [WARN] [querycoordv2/handlers.go:319] ["failed to get channels, collection not loaded"] [2024/03/11 11:08:46.035 +00:00] [WARN] [querycoordv2/services.go:821] ["failed to get replica info"] [traceID=ea100fd12bbb377f7bbb31ede75e9740] [collectionID=446631422104163043] [replica=446631422317035535] [error="failed to get channels, collection not loaded: collection not found[collection=446631422104163043]"] [errorVerbose="failed to get channels, collection not loaded: collection not found[collection=446631422104163043]\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/pkg/util/merr.WrapErrCollectionNotFound\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/merr/utils.go:424\n | github.com/milvus-io/milvus/internal/querycoordv2.(*Server).fillReplicaInfo\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/handlers.go:320\n | github.com/milvus-io/milvus/internal/querycoordv2.(*Server).GetReplicas\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/services.go:819\n | github.com/milvus-io/milvus/internal/distributed/querycoord.(*Server).GetReplicas\n | \t/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/service.go:388\n | github.com/milvus-io/milvus/internal/proto/querypb._QueryCoord_GetReplicas_Handler.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proto/querypb/query_coord.pb.go:5586\n | github.com/milvus-io/milvus/pkg/util/interceptor.ServerIDValidationUnaryServerInterceptor.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/interceptor/server_id_interceptor.go:54\n | github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25\n | github.com/milvus-io/milvus/pkg/util/interceptor.ClusterValidationUnaryServerInterceptor.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/interceptor/cluster_interceptor.go:48\n | github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25\n | github.com/milvus-io/milvus/pkg/util/logutil.UnaryTraceLoggerInterceptor\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/logutil/grpc_interceptor.go:23\n | github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25\n | go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1\n | \t/go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/[email protected]/interceptor.go:342\n | github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25\n | github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:34\n | github.com/milvus-io/milvus/internal/proto/querypb._QueryCoord_GetReplicas_Handler\n | \t/go/src/github.com/milvus-io/milvus/internal/proto/querypb/query_coord.pb.go:5588\n | google.golang.org/grpc.(*Server).processUnaryRPC\n | \t/go/pkg/mod/google.golang.org/[email protected]/server.go:1345\n | google.golang.org/grpc.(*Server).handleStream\n | \t/go/pkg/mod/google.golang.org/[email protected]/server.go:1722\n | google.golang.org/grpc.(*Server).serveStreams.func1.2\n | \t/go/pkg/mod/google.golang.org/[email protected]/server.go:966\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (2) failed to get channels, collection not loaded\nWraps: (3) collection not found[collection=446631422104163043]\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) merr.milvusError"]

could you offer querycoord and all querynode logs?
Seems that there is a channel offline and can not be recoverd?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

8 participants