-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: milvus standalone container suddenly shut down while etcd and minio are alive. #33757
Comments
seems that your etcd is slow is it deployed on ssds? |
when etcd is slow, milvus may panic itself |
Currently deployed on HDDs,Is there any other optimization solution for slow ETCD? |
etcd shall be deploy on SSD volumes. please retry |
you have to change etcd timeout to longer. But our recommendation is to change etcd to SSD server |
Okay, I'll give it a try.But our current data volume and usage are relatively low,will not using SSD cause Milvus to crash? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Is there an existing issue for this?
Environment
Current Behavior
I created a milvus container using Docker Composed, but it suddenly closed after a period of time. Both Minio and Etcd are running normally, but the log shows that it cannot connect to Etcd.
Expected Behavior
Milvus is running normally
Steps To Reproduce
No response
Milvus Log
[2024/06/07 12:32:51.337 +00:00] [ERROR] [rootcoord/import_manager.go:807] ["import manager failed to load from Etcd"] [error="context deadline exceeded"] [stack="github.com/milvus-io/milvus/internal/rootcoord.(*importManager).loadFromTaskStore\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/import_manager.go:807\ngithub.com/milvus-io/milvus/internal/rootcoord.(*importManager).loadAndFlipPersistedTasks\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/import_manager.go:295\ngithub.com/milvus-io/milvus/internal/rootcoord.(*importManager).flipTaskStateLoop\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/import_manager.go:159"]
[2024/06/07 12:32:51.337 +00:00] [ERROR] [rootcoord/import_manager.go:296] ["failed to load from task store"] [error="context deadline exceeded"] [stack="github.com/milvus-io/milvus/internal/rootcoord.(*importManager).loadAndFlipPersistedTasks\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/import_manager.go:296\ngithub.com/milvus-io/milvus/internal/rootcoord.(*importManager).flipTaskStateLoop\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/import_manager.go:159"]
[2024/06/07 12:32:51.337 +00:00] [ERROR] [rootcoord/import_manager.go:160] ["failed to flip ImportPersisted task"] [error="context deadline exceeded"] [stack="github.com/milvus-io/milvus/internal/rootcoord.(*importManager).flipTaskStateLoop\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/import_manager.go:160"]
[2024/06/07 12:32:52.109 +00:00] [INFO] [observers/collection_observer.go:160] ["observe partitions status"] [partitionNum=33]
[2024/06/07 12:32:53.109 +00:00] [INFO] [observers/collection_observer.go:160] ["observe partitions status"] [partitionNum=33]
[2024/06/07 12:32:53.505 +00:00] [INFO] [proxy/meta_cache.go:887] ["expire all shard leader cache"] [database=default] [collections="[dev_intent_tenant_67,dev_intent_tenant_83,oversea_test_v1_intent_system,demo_intent_system,demo_intent_tenant_19,uat_v1_intent_system,uat_intent_tenant_1,demo_intent_tenant_23,dev_v1_intent_tenant_87,oversea_test_v1_intent_tenant_1,uat_intent_system,iflytek_digit_intent_system,dev_intent_tenant_1,oversea_test_intent_tenant_1,oversea_test_v1_intent_tenant_19,oversea_test_intent_tenant_69,green_develop_intent_system,oversea_test_intent_system,demo_intent_tenant_25,dev_intent_tenant_85,dev_intent_tenant_98,uat_intent_tenant_19,demo_intent_tenant_21,dev_v1_intent_system,dev_v1_intent_tenant_1,dev_v1_intent_tenant_85,dev_v1_intent_tenant_98,dev_cache_sample_tenant_1,dev_intent_system,uat_v1_intent_tenant_1,demo_intent_tenant_1,green_develop_v1_intent_system,dev_intent_tenant_99]"]
[2024/06/07 12:32:54.109 +00:00] [INFO] [observers/collection_observer.go:160] ["observe partitions status"] [partitionNum=33]
[2024/06/07 12:32:54.490 +00:00] [WARN] [etcd/etcd_kv.go:648] ["Slow etcd operation save"] ["time spent"=7.00137856s] [key=by-dev/kv/gid/timestamp]
[2024/06/07 12:32:54.490 +00:00] [WARN] [rootcoord/root_coord.go:236] ["failed to update tso"] [error="etcdserver: request timed out"] [errorVerbose="etcdserver: request timed out\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/tso.(*timestampOracle).saveTimestamp\n | \t/go/src/github.com/milvus-io/milvus/internal/tso/tso.go:98\n | github.com/milvus-io/milvus/internal/tso.(*timestampOracle).UpdateTimestamp\n | \t/go/src/github.com/milvus-io/milvus/internal/tso/tso.go:201\n | github.com/milvus-io/milvus/internal/tso.(*GlobalTSOAllocator).UpdateTSO\n | \t/go/src/github.com/milvus-io/milvus/internal/tso/global_allocator.go:100\n | github.com/milvus-io/milvus/internal/rootcoord.(*Core).tsLoop\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/root_coord.go:235\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (2) etcdserver: request timed out\nError types: (1) *withstack.withStack (2) rpctypes.EtcdError"]
[2024/06/07 12:32:55.109 +00:00] [INFO] [observers/collection_observer.go:160] ["observe partitions status"] [partitionNum=33]
[2024/06/07 12:32:56.110 +00:00] [INFO] [observers/collection_observer.go:160] ["observe partitions status"] [partitionNum=33]
Anything else?
[2024/06/07 12:32:51.337 +00:00] [ERROR] [rootcoord/import_manager.go:807] ["import manager failed to load from Etcd"] [error="context deadline exceeded"] [stack="github.com/milvus-io/milvus/internal/rootcoord.(*importManager).loadFromTaskStore\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/import_manager.go:807\ngithub.com/milvus-io/milvus/internal/rootcoord.(*importManager).loadAndFlipPersistedTasks\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/import_manager.go:295\ngithub.com/milvus-io/milvus/internal/rootcoord.(*importManager).flipTaskStateLoop\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/import_manager.go:159"]
[2024/06/07 12:32:51.337 +00:00] [ERROR] [rootcoord/import_manager.go:296] ["failed to load from task store"] [error="context deadline exceeded"] [stack="github.com/milvus-io/milvus/internal/rootcoord.(*importManager).loadAndFlipPersistedTasks\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/import_manager.go:296\ngithub.com/milvus-io/milvus/internal/rootcoord.(*importManager).flipTaskStateLoop\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/import_manager.go:159"]
[2024/06/07 12:32:51.337 +00:00] [ERROR] [rootcoord/import_manager.go:160] ["failed to flip ImportPersisted task"] [error="context deadline exceeded"] [stack="github.com/milvus-io/milvus/internal/rootcoord.(*importManager).flipTaskStateLoop\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/import_manager.go:160"]
[2024/06/07 12:32:52.109 +00:00] [INFO] [observers/collection_observer.go:160] ["observe partitions status"] [partitionNum=33]
[2024/06/07 12:32:53.109 +00:00] [INFO] [observers/collection_observer.go:160] ["observe partitions status"] [partitionNum=33]
[2024/06/07 12:32:53.505 +00:00] [INFO] [proxy/meta_cache.go:887] ["expire all shard leader cache"] [database=default] [collections="[dev_intent_tenant_67,dev_intent_tenant_83,oversea_test_v1_intent_system,demo_intent_system,demo_intent_tenant_19,uat_v1_intent_system,uat_intent_tenant_1,demo_intent_tenant_23,dev_v1_intent_tenant_87,oversea_test_v1_intent_tenant_1,uat_intent_system,iflytek_digit_intent_system,dev_intent_tenant_1,oversea_test_intent_tenant_1,oversea_test_v1_intent_tenant_19,oversea_test_intent_tenant_69,green_develop_intent_system,oversea_test_intent_system,demo_intent_tenant_25,dev_intent_tenant_85,dev_intent_tenant_98,uat_intent_tenant_19,demo_intent_tenant_21,dev_v1_intent_system,dev_v1_intent_tenant_1,dev_v1_intent_tenant_85,dev_v1_intent_tenant_98,dev_cache_sample_tenant_1,dev_intent_system,uat_v1_intent_tenant_1,demo_intent_tenant_1,green_develop_v1_intent_system,dev_intent_tenant_99]"]
[2024/06/07 12:32:54.109 +00:00] [INFO] [observers/collection_observer.go:160] ["observe partitions status"] [partitionNum=33]
[2024/06/07 12:32:54.490 +00:00] [WARN] [etcd/etcd_kv.go:648] ["Slow etcd operation save"] ["time spent"=7.00137856s] [key=by-dev/kv/gid/timestamp]
[2024/06/07 12:32:54.490 +00:00] [WARN] [rootcoord/root_coord.go:236] ["failed to update tso"] [error="etcdserver: request timed out"] [errorVerbose="etcdserver: request timed out\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/tso.(*timestampOracle).saveTimestamp\n | \t/go/src/github.com/milvus-io/milvus/internal/tso/tso.go:98\n | github.com/milvus-io/milvus/internal/tso.(*timestampOracle).UpdateTimestamp\n | \t/go/src/github.com/milvus-io/milvus/internal/tso/tso.go:201\n | github.com/milvus-io/milvus/internal/tso.(*GlobalTSOAllocator).UpdateTSO\n | \t/go/src/github.com/milvus-io/milvus/internal/tso/global_allocator.go:100\n | github.com/milvus-io/milvus/internal/rootcoord.(*Core).tsLoop\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/root_coord.go:235\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (2) etcdserver: request timed out\nError types: (1) *withstack.withStack (2) rpctypes.EtcdError"]
[2024/06/07 12:32:55.109 +00:00] [INFO] [observers/collection_observer.go:160] ["observe partitions status"] [partitionNum=33]
[2024/06/07 12:32:56.110 +00:00] [INFO] [observers/collection_observer.go:160] ["observe partitions status"] [partitionNum=33]
The text was updated successfully, but these errors were encountered: