Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [benchmark][cluster] Milvus is stuck and the client has no response #38609

Open
1 task done
wangting0128 opened this issue Dec 20, 2024 · 5 comments
Open
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@wangting0128
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-20241219-306e5e68-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc124
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: fouramf-46vpj

server:

NAME                                                              READY   STATUS      RESTARTS       AGE     IP              NODE         NOMINATED NODE   READINESS GATES
fouramf-qz2xh-28-2434-etcd-0                                      1/1     Running     0              39h     10.104.24.169   4am-node29   <none>           <none>
fouramf-qz2xh-28-2434-etcd-1                                      1/1     Running     0              39h     10.104.27.189   4am-node31   <none>           <none>
fouramf-qz2xh-28-2434-etcd-2                                      1/1     Running     0              39h     10.104.25.170   4am-node30   <none>           <none>
fouramf-qz2xh-28-2434-milvus-datanode-67f777d989-d9479            1/1     Running     0              21h     10.104.33.6     4am-node36   <none>           <none>
fouramf-qz2xh-28-2434-milvus-indexnode-85c5b599d4-2d2cd           1/1     Running     0              21h     10.104.21.12    4am-node24   <none>           <none>
fouramf-qz2xh-28-2434-milvus-indexnode-85c5b599d4-78xnh           1/1     Running     0              21h     10.104.17.74    4am-node23   <none>           <none>
fouramf-qz2xh-28-2434-milvus-indexnode-85c5b599d4-dcrm4           1/1     Running     0              20h     10.104.27.156   4am-node31   <none>           <none>
fouramf-qz2xh-28-2434-milvus-indexnode-85c5b599d4-dgcf6           1/1     Running     0              20h     10.104.23.28    4am-node27   <none>           <none>
fouramf-qz2xh-28-2434-milvus-indexnode-85c5b599d4-nm792           1/1     Running     0              21h     10.104.9.175    4am-node14   <none>           <none>
fouramf-qz2xh-28-2434-milvus-indexnode-85c5b599d4-vvxtf           1/1     Running     0              20h     10.104.25.150   4am-node30   <none>           <none>
fouramf-qz2xh-28-2434-milvus-indexnode-85c5b599d4-zgqpg           1/1     Running     0              21h     10.104.34.232   4am-node37   <none>           <none>
fouramf-qz2xh-28-2434-milvus-indexnode-85c5b599d4-zl7kv           1/1     Running     0              20h     10.104.20.22    4am-node22   <none>           <none>
fouramf-qz2xh-28-2434-milvus-mixcoord-857b5f7d4-czj5b             1/1     Running     0              21h     10.104.21.11    4am-node24   <none>           <none>
fouramf-qz2xh-28-2434-milvus-proxy-5d6c86c556-99bgp               1/1     Running     0              21h     10.104.23.228   4am-node27   <none>           <none>
fouramf-qz2xh-28-2434-milvus-querynode-7575fcff6-wkcxf            1/1     Running     0              21h     10.104.33.7     4am-node36   <none>           <none>
fouramf-qz2xh-28-2434-minio-0                                     1/1     Running     0              3d1h    10.104.24.77    4am-node29   <none>           <none>
fouramf-qz2xh-28-2434-minio-1                                     1/1     Running     0              3d1h    10.104.25.206   4am-node30   <none>           <none>
fouramf-qz2xh-28-2434-minio-2                                     1/1     Running     0              3d1h    10.104.15.36    4am-node20   <none>           <none>
fouramf-qz2xh-28-2434-minio-3                                     1/1     Running     0              3d1h    10.104.27.135   4am-node31   <none>           <none>
fouramf-qz2xh-28-2434-pulsarv3-bookie-0                           1/1     Running     0              3d1h    10.104.25.205   4am-node30   <none>           <none>
fouramf-qz2xh-28-2434-pulsarv3-bookie-1                           1/1     Running     0              3d1h    10.104.27.132   4am-node31   <none>           <none>
fouramf-qz2xh-28-2434-pulsarv3-bookie-2                           1/1     Running     0              3d1h    10.104.24.81    4am-node29   <none>           <none>
fouramf-qz2xh-28-2434-pulsarv3-broker-0                           1/1     Running     0              3d1h    10.104.15.31    4am-node20   <none>           <none>
fouramf-qz2xh-28-2434-pulsarv3-broker-1                           1/1     Running     0              3d1h    10.104.13.106   4am-node16   <none>           <none>
fouramf-qz2xh-28-2434-pulsarv3-proxy-0                            1/1     Running     0              3d1h    10.104.25.197   4am-node30   <none>           <none>
fouramf-qz2xh-28-2434-pulsarv3-proxy-1                            1/1     Running     0              3d1h    10.104.13.104   4am-node16   <none>           <none>
fouramf-qz2xh-28-2434-pulsarv3-recovery-0                         1/1     Running     0              3d1h    10.104.15.34    4am-node20   <none>           <none>
fouramf-qz2xh-28-2434-pulsarv3-zookeeper-0                        1/1     Running     0              3d1h    10.104.24.78    4am-node29   <none>           <none>
fouramf-qz2xh-28-2434-pulsarv3-zookeeper-1                        1/1     Running     0              3d1h    10.104.25.204   4am-node30   <none>           <none>
fouramf-qz2xh-28-2434-pulsarv3-zookeeper-2                        1/1     Running     0              3d1h    10.104.20.24    4am-node22   <none>           <none>

image

The following requests have no response

client logs:

1. [2024-12-20 01:57:21,408 - DEBUG - fouram]: [Base] Params of partition:scene_test_partition_hybrid_search_dior520E hybrid_search: reqs:[{'anns_field': 'float_vector', 'param': {'metric_type': 'L2', 'params': {'ef': 32}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'float_vector_1', 'param': {'metric_type': 'IP', 'params': {'search_list': 30}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'sparse_float_vector', 'param': {'metric_type': 'IP', 'params': {'drop_ratio_search': 0.3}}, 'limit': 30, 'expr': None, 'nq': 1}, {'anns_field': 'bfloat16_vector', 'param': {'metric_type': 'L2', 'params': {'nprobe': 16}}, 'limit': 400, 'expr': None, 'nq': 1}], rerank:{'strategy': 'rrf', 'params': {'k': 60}}, limit:1, timeout:6000, kwargs:{'check_task': 'check_response'} (base.py:868)
[2024-12-20 01:57:21,408 - DEBUG - fouram]: (api_request)  : [Partition.hybrid_search] args: [[<pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e01754d60>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e01754610>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e01778d90>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e01778dc0>], <pymilvus.client.abstract.RRFRanker object at 0x7f4bf504b1f0>, 1, ['*'], 6000, -1], kwargs: {}, [requestId: c009b696-be75-11ef-8acf-7ee0480c3f50] (api_request.py:77)

2. [2024-12-20 01:57:21,411 - DEBUG - fouram]: (api_request)  : [Partition] args: [<Collection>:
-------------
<name>: fouram_WlWdokxT
<description>: 
<schema>: {'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}, {'name': 'float_vector_1', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 768}}, {'name': 'sparse_float_vector', 'description': '', 'type': <DataType.SPARSE_FLOAT_VECTOR: 104>}, {'name': 'bfloat16_vector', 'description': '', 'type': <DataType.BFLOAT16_VECTOR: 103>, 'params': {'dim': 256}}, {'name': 'int64_1', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'varchar_1', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 256}}], 'enable_dynamic_field': False}
, 'scene_test_partition_hybrid_search_rCTUELMU', ''], kwargs: {}, [requestId: c00a1e9c-be75-11ef-8acf-7ee0480c3f50] (api_request.py:77)


3. [2024-12-20 01:57:23,434 - DEBUG - fouram]: [Base] Params of partition:scene_test_partition_hybrid_search_pln8a36H hybrid_search: reqs:[{'anns_field': 'float_vector', 'param': {'metric_type': 'L2', 'params': {'ef': 32}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'float_vector_1', 'param': {'metric_type': 'IP', 'params': {'search_list': 30}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'sparse_float_vector', 'param': {'metric_type': 'IP', 'params': {'drop_ratio_search': 0.3}}, 'limit': 30, 'expr': None, 'nq': 1}, {'anns_field': 'bfloat16_vector', 'param': {'metric_type': 'L2', 'params': {'nprobe': 16}}, 'limit': 400, 'expr': None, 'nq': 1}], rerank:{'strategy': 'rrf', 'params': {'k': 60}}, limit:1, timeout:6000, kwargs:{'check_task': 'check_response'} (base.py:868)
[2024-12-20 01:57:23,434 - DEBUG - fouram]: (api_request)  : [Partition.hybrid_search] args: [[<pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0173aa00>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e01750e80>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e01750100>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e017509d0>], <pymilvus.client.abstract.RRFRanker object at 0x7f4bf504b1f0>, 1, ['*'], 6000, -1], kwargs: {}, [requestId: c13ede74-be75-11ef-8acf-7ee0480c3f50] (api_request.py:77)

4. [2024-12-20 01:57:23,437 - DEBUG - fouram]: [Base] Params of partition:scene_test_partition_hybrid_search_EVUcR0GU hybrid_search: reqs:[{'anns_field': 'float_vector', 'param': {'metric_type': 'L2', 'params': {'ef': 32}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'float_vector_1', 'param': {'metric_type': 'IP', 'params': {'search_list': 30}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'sparse_float_vector', 'param': {'metric_type': 'IP', 'params': {'drop_ratio_search': 0.3}}, 'limit': 30, 'expr': None, 'nq': 1}, {'anns_field': 'bfloat16_vector', 'param': {'metric_type': 'L2', 'params': {'nprobe': 16}}, 'limit': 400, 'expr': None, 'nq': 1}], rerank:{'strategy': 'rrf', 'params': {'k': 60}}, limit:1, timeout:6000, kwargs:{'check_task': 'check_response'} (base.py:868)
[2024-12-20 01:57:23,437 - DEBUG - fouram]: (api_request)  : [Partition.hybrid_search] args: [[<pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0173eb20>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0173e130>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0173e520>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0173e7f0>], <pymilvus.client.abstract.RRFRanker object at 0x7f4bf504b1f0>, 1, ['*'], 6000, -1], kwargs: {}, [requestId: c13f512e-be75-11ef-8acf-7ee0480c3f50] (api_request.py:77)

5. [2024-12-20 01:57:23,443 - DEBUG - fouram]: (api_request)  : [Collection] args: ['scene_hybrid_search_test_32JqTbe6', {'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}, {'name': 'binary_vector_scene_hybrid_search_test_1', 'description': '', 'type': <DataType.BINARY_VECTOR: 100>, 'params': {'dim': 512}}, {'name': 'float16_vector_scene_hybrid_search_test_2', 'description': '', 'type': <DataType.FLOAT16_VECTOR: 102>, 'params': {'dim': 64}}, {'name': 'sparse_float_vector_scene_hybrid_search_test_3', 'description': '', 'type': <DataType.SPARSE_FLOAT_VECTOR: 104>}, {'name': 'int64_1', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'bool_1', 'description': '', 'type': <DataType.BOOL: 1>}, {'name': 'varchar_1', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 256}}], 'enable_dynamic_field': False}, 'default'], kwargs: {'shards_num': 2}, [requestId: c1403008-be75-11ef-8acf-7ee0480c3f50] (api_request.py:77)

6. [2024-12-20 01:57:23,478 - DEBUG - fouram]: [Base] Params of partition:scene_test_partition_hybrid_search_2OpXfVmv hybrid_search: reqs:[{'anns_field': 'float_vector', 'param': {'metric_type': 'L2', 'params': {'ef': 32}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'float_vector_1', 'param': {'metric_type': 'IP', 'params': {'search_list': 30}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'sparse_float_vector', 'param': {'metric_type': 'IP', 'params': {'drop_ratio_search': 0.3}}, 'limit': 30, 'expr': None, 'nq': 1}, {'anns_field': 'bfloat16_vector', 'param': {'metric_type': 'L2', 'params': {'nprobe': 16}}, 'limit': 400, 'expr': None, 'nq': 1}], rerank:{'strategy': 'rrf', 'params': {'k': 60}}, limit:1, timeout:6000, kwargs:{'check_task': 'check_response'} (base.py:868)
[2024-12-20 01:57:23,479 - DEBUG - fouram]: (api_request)  : [Partition.hybrid_search] args: [[<pymilvus.client.abstract.AnnSearchRequest object at 0x7f4d0868c160>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4d0868cca0>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4d0868c700>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4d0868c7c0>], <pymilvus.client.abstract.RRFRanker object at 0x7f4bf504b1f0>, 1, ['*'], 6000, -1], kwargs: {}, [requestId: c1459d86-be75-11ef-8acf-7ee0480c3f50] (api_request.py:77)

7. [2024-12-20 01:57:23,483 - DEBUG - fouram]: [Base] Params of partition:scene_test_partition_hybrid_search_sLCneRiV hybrid_search: reqs:[{'anns_field': 'float_vector', 'param': {'metric_type': 'L2', 'params': {'ef': 32}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'float_vector_1', 'param': {'metric_type': 'IP', 'params': {'search_list': 30}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'sparse_float_vector', 'param': {'metric_type': 'IP', 'params': {'drop_ratio_search': 0.3}}, 'limit': 30, 'expr': None, 'nq': 1}, {'anns_field': 'bfloat16_vector', 'param': {'metric_type': 'L2', 'params': {'nprobe': 16}}, 'limit': 400, 'expr': None, 'nq': 1}], rerank:{'strategy': 'rrf', 'params': {'k': 60}}, limit:1, timeout:6000, kwargs:{'check_task': 'check_response'} (base.py:868)
[2024-12-20 01:57:23,483 - DEBUG - fouram]: (api_request)  : [Partition.hybrid_search] args: [[<pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0173f400>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0173f910>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0173f070>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0173f0a0>], <pymilvus.client.abstract.RRFRanker object at 0x7f4bf504b1f0>, 1, ['*'], 6000, -1], kwargs: {}, [requestId: c14653ac-be75-11ef-8acf-7ee0480c3f50] (api_request.py:77)

8. [2024-12-20 01:57:23,487 - DEBUG - fouram]: [Base] Params of partition:scene_test_partition_hybrid_search_DRbzmWnb hybrid_search: reqs:[{'anns_field': 'float_vector', 'param': {'metric_type': 'L2', 'params': {'ef': 32}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'float_vector_1', 'param': {'metric_type': 'IP', 'params': {'search_list': 30}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'sparse_float_vector', 'param': {'metric_type': 'IP', 'params': {'drop_ratio_search': 0.3}}, 'limit': 30, 'expr': None, 'nq': 1}, {'anns_field': 'bfloat16_vector', 'param': {'metric_type': 'L2', 'params': {'nprobe': 16}}, 'limit': 400, 'expr': None, 'nq': 1}], rerank:{'strategy': 'rrf', 'params': {'k': 60}}, limit:1, timeout:6000, kwargs:{'check_task': 'check_response'} (base.py:868)
[2024-12-20 01:57:23,487 - DEBUG - fouram]: (api_request)  : [Partition.hybrid_search] args: [[<pymilvus.client.abstract.AnnSearchRequest object at 0x7f4d0868c1f0>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4d0868cb20>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e017483d0>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e01748100>], <pymilvus.client.abstract.RRFRanker object at 0x7f4bf504b1f0>, 1, ['*'], 6000, -1], kwargs: {}, [requestId: c146ef4c-be75-11ef-8acf-7ee0480c3f50] (api_request.py:77)

9. [2024-12-20 01:57:23,537 - DEBUG - fouram]: [Base] Params of partition:scene_test_partition_hybrid_search_zk5IhoeN hybrid_search: reqs:[{'anns_field': 'float_vector', 'param': {'metric_type': 'L2', 'params': {'ef': 32}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'float_vector_1', 'param': {'metric_type': 'IP', 'params': {'search_list': 30}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'sparse_float_vector', 'param': {'metric_type': 'IP', 'params': {'drop_ratio_search': 0.3}}, 'limit': 30, 'expr': None, 'nq': 1}, {'anns_field': 'bfloat16_vector', 'param': {'metric_type': 'L2', 'params': {'nprobe': 16}}, 'limit': 400, 'expr': None, 'nq': 1}], rerank:{'strategy': 'rrf', 'params': {'k': 60}}, limit:1, timeout:6000, kwargs:{'check_task': 'check_response'} (base.py:868)
[2024-12-20 01:57:23,537 - DEBUG - fouram]: (api_request)  : [Partition.hybrid_search] args: [[<pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0174ea30>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0174e2e0>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0174e6d0>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0174eac0>], <pymilvus.client.abstract.RRFRanker object at 0x7f4bf504b1f0>, 1, ['*'], 6000, -1], kwargs: {}, [requestId: c14e7cc6-be75-11ef-8acf-7ee0480c3f50] (api_request.py:77)

10. [2024-12-20 01:57:23,538 - DEBUG - fouram]: (api_request)  : [load_state] args: ['scene_hybrid_search_test_KpWfACG6', None, 'default'], kwargs: {}, [requestId: c14eb4ca-be75-11ef-8acf-7ee0480c3f50] (api_request.py:77)

11. [2024-12-20 01:57:23,541 - DEBUG - fouram]: [Base] Params of concurrent_hybrid_search reqs: [{'anns_field': 'float_vector', 'param': {'metric_type': 'L2', 'params': {'ef': 32}}, 'limit': 10, 'expr': 'int64_1 > 100000', 'nq': 1}, {'anns_field': 'float_vector_1', 'param': {'metric_type': 'IP', 'params': {'search_list': 30}}, 'limit': 10, 'expr': 'id < 900000', 'nq': 1}, {'anns_field': 'sparse_float_vector', 'param': {'metric_type': 'IP', 'params': {'drop_ratio_search': 0.3}}, 'limit': 30, 'expr': 'varchar_1 > "1"', 'nq': 1}, {'anns_field': 'bfloat16_vector', 'param': {'metric_type': 'L2', 'params': {'nprobe': 16}}, 'limit': 400, 'expr': None, 'nq': 1}], {'rerank': {'strategy': 'weighted', 'params': {'weights': [0.85, 0.95, 0.51, 0.32]}}, 'limit': 100, 'output_fields': ['*'], 'ignore_growing': False, 'guarantee_timestamp': None, 'partition_names': ['_default'], 'timeout': 6000} (base.py:882)
[2024-12-20 01:57:23,541 - DEBUG - fouram]: (api_request)  : [Collection.hybrid_search] args: [[<pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0173a130>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0173a1f0>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0173adc0>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0173ad00>], <pymilvus.client.abstract.WeightedRanker object at 0x7f4bf504b7f0>, 100, ['_default'], ['*'], 6000, -1], kwargs: {}, [requestId: c14f14ce-be75-11ef-8acf-7ee0480c3f50] (api_request.py:77)

12. [2024-12-20 01:57:32,792 - DEBUG - fouram]: [Base] Params of partition:scene_test_partition_hybrid_search_GnMvTND0 hybrid_search: reqs:[{'anns_field': 'float_vector', 'param': {'metric_type': 'L2', 'params': {'ef': 32}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'float_vector_1', 'param': {'metric_type': 'IP', 'params': {'search_list': 30}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'sparse_float_vector', 'param': {'metric_type': 'IP', 'params': {'drop_ratio_search': 0.3}}, 'limit': 30, 'expr': None, 'nq': 1}, {'anns_field': 'bfloat16_vector', 'param': {'metric_type': 'L2', 'params': {'nprobe': 16}}, 'limit': 400, 'expr': None, 'nq': 1}], rerank:{'strategy': 'rrf', 'params': {'k': 60}}, limit:1, timeout:6000, kwargs:{'check_task': 'check_response'} (base.py:868)
[2024-12-20 01:57:32,792 - DEBUG - fouram]: (api_request)  : [Partition.hybrid_search] args: [[<pymilvus.client.abstract.AnnSearchRequest object at 0x7f4d97da3700>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e01322af0>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e01322040>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e01322820>], <pymilvus.client.abstract.RRFRanker object at 0x7f4bf504b1f0>, 1, ['*'], 6000, -1], kwargs: {}, [requestId: c6d2b90a-be75-11ef-8acf-7ee0480c3f50] (api_request.py:77)

13. [2024-12-20 01:57:34,091 - DEBUG - fouram]: [Base] Params of partition:scene_test_partition_hybrid_search_znlSqRpc hybrid_search: reqs:[{'anns_field': 'float_vector', 'param': {'metric_type': 'L2', 'params': {'ef': 32}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'float_vector_1', 'param': {'metric_type': 'IP', 'params': {'search_list': 30}}, 'limit': 10, 'expr': None, 'nq': 1}, {'anns_field': 'sparse_float_vector', 'param': {'metric_type': 'IP', 'params': {'drop_ratio_search': 0.3}}, 'limit': 30, 'expr': None, 'nq': 1}, {'anns_field': 'bfloat16_vector', 'param': {'metric_type': 'L2', 'params': {'nprobe': 16}}, 'limit': 400, 'expr': None, 'nq': 1}], rerank:{'strategy': 'rrf', 'params': {'k': 60}}, limit:1, timeout:6000, kwargs:{'check_task': 'check_response'} (base.py:868)
[2024-12-20 01:57:34,091 - DEBUG - fouram]: (api_request)  : [Partition.hybrid_search] args: [[<pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e01322a00>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e0174b700>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e017547f0>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7f4e017542b0>], <pymilvus.client.abstract.RRFRanker object at 0x7f4bf504b1f0>, 1, ['*'], 6000, -1], kwargs: {}, [requestId: c798f7be-be75-11ef-8acf-7ee0480c3f50] (api_request.py:77)

14. [2024-12-20 00:17:23,486 - DEBUG - fouram]: (api_request)  : [Collection] args: ['scene_test_fcIPtZco', {'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}], 'enable_dynamic_field': False}, 'default'], kwargs: {'shards_num': 2}, [requestId: c8ff5d04-be67-11ef-8acf-7ee0480c3f50] (api_request.py:77)

15. [2024-12-20 00:17:32,789 - DEBUG - fouram]: (api_request)  : [Collection] args: ['scene_test_2cEmF7ST', {'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}], 'enable_dynamic_field': False}, 'default'], kwargs: {'shards_num': 2}, [requestId: ce8aff76-be67-11ef-8acf-7ee0480c3f50] (api_request.py:77)

16. [2024-12-19 22:37:23,492 - DEBUG - fouram]: (api_request)  : [Collection.query] args: ['int64_1 > -1 &&   11013559 < id < 11013559 + 1000000', ['*'], ['_default'], 6000], kwargs: {'limit': 10}, [requestId: d0b8fae0-be59-11ef-8acf-7ee0480c3f50] (api_request.py:77)

17. [2024-12-19 22:37:23,674 - DEBUG - fouram]: (api_request)  : [Collection.query] args: ['int64_1 > -1 &&   3099030 < id < 3099030 + 1000000', ['*'], ['_default'], 6000], kwargs: {'limit': 10}, [requestId: d0d4af92-be59-11ef-8acf-7ee0480c3f50] (api_request.py:77)

18. [2024-12-19 22:37:25,927 - DEBUG - fouram]: (api_request)  : [Index] args: [<Collection>:
-------------
<name>: scene_hybrid_search_test_0gCWkq1i
<description>: 
<schema>: {'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}, {'name': 'binary_vector_scene_hybrid_search_test_1', 'description': '', 'type': <DataType.BINARY_VECTOR: 100>, 'params': {'dim': 512}}, {'name': 'float16_vector_scene_hybrid_search_test_2', 'description': '', 'type': <DataType.FLOAT16_VECTOR: 102>, 'params': {'dim': 64}}, {'name': 'sparse_float_vector_scene_hybrid_search_test_3', 'description': '', 'type': <DataType.SPARSE_FLOAT_VECTOR: 104>}, {'name': 'int64_1', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'bool_1', 'description': '', 'type': <DataType.BOOL: 1>}, {'name': 'varchar_1', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 256}}], 'enable_dynamic_field': False}
, 'varchar_1', {'index_type': 'INVERTED'}], kwargs: {}, [requestId: d22c93a0-be59-11ef-8acf-7ee0480c3f50] (api_request.py:77)


19. [2024-12-19 22:37:28,701 - DEBUG - fouram]: (api_request)  : [Index] args: [<Collection>:
-------------
<name>: scene_test_0OLcYjs3
<description>: 
<schema>: {'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}], 'enable_dynamic_field': False}
, 'float_vector', {'index_type': 'IVF_SQ8', 'metric_type': 'L2', 'params': {'nlist': 2048}}], kwargs: {}, [requestId: d3d3dcea-be59-11ef-8acf-7ee0480c3f50] (api_request.py:77)

20. [2024-12-19 22:37:28,702 - DEBUG - fouram]: (api_request)  : [Index] args: [<Collection>:
-------------
<name>: scene_test_INZX5wLa
<description>: 
<schema>: {'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}], 'enable_dynamic_field': False}
, 'float_vector', {'index_type': 'IVF_SQ8', 'metric_type': 'L2', 'params': {'nlist': 2048}}], kwargs: {}, [requestId: d3d3f658-be59-11ef-8acf-7ee0480c3f50] (api_request.py:77)

Expected Behavior

No response

Steps To Reproduce

1. re-build index on an exited collection with 20m data
   - HNSW: float_vector
   - DISKANN: float_vector_1
   - SPARSE_INVERTED_INDEX: sparse_float_vector
   - IVF_SQ8: bfloat16_vector
   - INVERTED: int64_1, varchar_1
2. load collection
3. concurrent requests
   - scene_hybrid_search_test
     (collection: create->insert->flush->index->load->hybrid_search->drop)
   - scene_test
     (collection: create->insert->flush->index->drop)
   - scene_test_partition_hybrid_search
     (partition: create->insert->flush->index again->load->hybrid_search->release->hybrid_search failed->drop)
   - search
   - hybrid_search
   - query

Milvus Log

No response

Anything else?

server config:

    extraConfigFiles:
      user.yaml: |+
        indexCoord:
          scheduler:
            interval: 1
        queryNode:
          mmap:
            vectorField: true
            vectorIndex: true
            scalarField: true
            scalarIndex: true
    queryNode:
      resources:
        limits:
          cpu: '32'
          memory: 32Gi
        requests:
          cpu: '16'
          memory: 32Gi
      replicas: 1
      nodeSelector:
        node-role/nvme: 'true'
    indexNode:
      resources:
        limits:
          cpu: '4.0'
          memory: 16Gi
        requests:
          cpu: '2.0'
          memory: 4Gi
      replicas: 8
    dataNode:
      resources:
        limits:
          cpu: '2.0'
          memory: 16Gi
        requests:
          cpu: '2.0'
          memory: 5Gi

client config: fouramf-client-all-vector-types-dql-ddl

    dataset_params:
      metric_type: L2
      dim: 128
      scalars_index:
        int64_1:
          index_type: INVERTED
        varchar_1:
          index_type: INVERTED
      vectors_index:
        float_vector_1:
          index_type: DISKANN
          index_param: {}
          metric_type: IP
        sparse_float_vector:
          index_type: SPARSE_INVERTED_INDEX
          index_param:
            drop_ratio_build: 0.2
          metric_type: IP
        bfloat16_vector:
          index_type: IVF_SQ8
          index_param:
            nlist: 2048
          metric_type: L2
      scalars_params:
        float_vector_1:
          params:
            dim: 768
          other_params:
            dataset: laion2b_multi
            column_name: float32_vector
        sparse_float_vector:
          other_params:
            dim: 10000
            sparse_range:
            - 1
            - 20
        bfloat16_vector:
          params:
            dim: 256
      dataset_name: sift
      dataset_size: 20m
      ni_per: 10000
    collection_params:
      other_fields:
      - float_vector_1
      - sparse_float_vector
      - bfloat16_vector
      - int64_1
      - varchar_1
      shards_num: 2
    index_params:
      index_type: HNSW
      index_param:
        M: 8
        efConstruction: 200
    concurrent_params:
      concurrent_number: 20
      during_time: 24h
      interval: 20
    concurrent_tasks:
    - type: scene_hybrid_search_test
      weight: 1
      params:
        nq: 2
        top_k: 5
        reqs:
        - search_param:
            nprobe: 128
          anns_field: float_vector
          expr: bool_1 == True
          top_k: 100
        - search_param:
            nprobe: 32
          anns_field: binary_vector_scene_hybrid_search_test_1
          expr: bool_1 != True
          top_k: 10
        - search_param:
            search_list: 30
          anns_field: float16_vector_scene_hybrid_search_test_2
          expr: int64_1 >= 1500
          top_k: 5
        - search_param:
            drop_ratio_search: 0.1
          anns_field: sparse_float_vector_scene_hybrid_search_test_3
          expr: varchar_1 like "1%"
          top_k: 10
        rerank:
          RRFRanker: []
        output_fields:
        - "*"
        timeout: 600
        random_data: true
        dataset: local
        dim: 128
        shards_num: 2
        data_size: 3000
        nb: 3000
        index_type: IVF_SQ8
        index_param:
          nlist: 2048
        metric_type: L2
        other_fields:
        - binary_vector_scene_hybrid_search_test_1
        - float16_vector_scene_hybrid_search_test_2
        - sparse_float_vector_scene_hybrid_search_test_3
        - int64_1
        - bool_1
        - varchar_1
        replica_number: 1
        scalars_params:
          binary_vector_scene_hybrid_search_test_1:
            params:
              dim: 512
            other_params:
              dataset: binary
          float16_vector_scene_hybrid_search_test_2:
            params:
              dim: 64
        scalars_index:
          int64_1: {}
          bool_1:
            index_type: BITMAP
          varchar_1:
            index_type: INVERTED
        vectors_index:
          binary_vector_scene_hybrid_search_test_1:
            index_type: BIN_IVF_FLAT
            index_param:
              nlist: 2048
            metric_type: JACCARD
          float16_vector_scene_hybrid_search_test_2:
            index_type: DISKANN
            index_param: {}
            metric_type: IP
          sparse_float_vector_scene_hybrid_search_test_3:
            index_type: SPARSE_WAND
            index_param:
              drop_ratio_build: 0.2
            metric_type: IP
        hybrid_search_counts: 10
    - type: scene_test
      weight: 1
      params:
        dim: 128
        data_size: 3000
        nb: 3000
        index_type: IVF_SQ8
        index_param:
          nlist: 2048
        metric_type: L2
    - type: scene_test_partition_hybrid_search
      weight: 1
      params:
        nq: 1
        top_k: 1
        reqs:
        - search_param:
            ef: 32
          anns_field: float_vector
          top_k: 10
        - search_param:
            search_list: 30
          anns_field: float_vector_1
          top_k: 10
        - search_param:
            drop_ratio_search: 0.3
          anns_field: sparse_float_vector
          top_k: 30
        - search_param:
            nprobe: 16
          anns_field: bfloat16_vector
          top_k: 400
        rerank:
          RRFRanker: []
        output_fields:
        - "*"
        timeout: 6000
        random_data: true
        hybrid_search_counts: 10
        data_size: 3000
        ni: 3000
    - type: search
      weight: 1
      params:
        nq: 1000
        top_k: 1
        search_param:
          nprobe: 1000
        expr: int64_1 >= 0
        timeout: 6000
        random_data: true
        partition_names:
        - _default
    - type: hybrid_search
      weight: 1
      params:
        nq: 1
        top_k: 100
        reqs:
        - search_param:
            ef: 32
          anns_field: float_vector
          expr: int64_1 > 100000
          top_k: 10
        - search_param:
            search_list: 30
          anns_field: float_vector_1
          expr: id < 900000
          top_k: 10
        - search_param:
            drop_ratio_search: 0.3
          anns_field: sparse_float_vector
          expr: varchar_1 > "1"
          top_k: 30
        - search_param:
            nprobe: 16
          anns_field: bfloat16_vector
          top_k: 400
        rerank:
          WeightedRanker:
          - 0.85
          - 0.95
          - 0.51
          - 0.32
        output_fields:
        - "*"
        partition_names:
        - _default
        timeout: 6000
        random_data: true
    - type: query
      weight: 1
      params:
        expr: 'int64_1 > -1 && '
        output_fields:
        - "*"
        partition_names:
        - _default
        limit: 10
        timeout: 6000
        custom_expr: " {0} < id < {0} + 1000000"
        custom_range:
        - 0
        - 20000000
@wangting0128 wangting0128 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. test/benchmark benchmark test labels Dec 20, 2024
@wangting0128 wangting0128 added this to the 2.5.0 milestone Dec 20, 2024
@yanliang567
Copy link
Contributor

/assign @sunby
/unassign

@sre-ci-robot sre-ci-robot assigned sunby and unassigned yanliang567 Dec 20, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 20, 2024
@yanliang567 yanliang567 modified the milestones: 2.5.0, 2.5.1 Dec 24, 2024
@xiaofan-luan
Copy link
Collaborator

/assign @aoiasd
please help on this

@aoiasd
Copy link
Contributor

aoiasd commented Dec 27, 2024

Situation start at 19 15:58, and we could see proxy start report disk not found at same time.
60a6a571-661b-47e7-83fb-f43ae529cf79
So may disk has some problem.
But proxy disk error not the mean reason cause stuck.
We could find that some channel like by-dev-rootcoord-dml_1_454664015923904872v1, can't fetch message from dispatcher and cause query or search stuck and timeout (1h30min, seems too long, some error?)
But some other channel still work success. Like by-dev-rootcoord-dml_1_454664015923904872v0
image

@aoiasd
Copy link
Contributor

aoiasd commented Dec 27, 2024

But rootcoord works success, so seems dispatcher or msgstream has some problem.

@aoiasd
Copy link
Contributor

aoiasd commented Dec 27, 2024

Another question was that our querynode tt lag metric report by sub time now and msg tt and label it by collection id, this will cause it looks like healthy when one channel consume stuck but others still work.
We should report msg tt and calculate tt lag by prometheus, fix later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants