Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [benchmark][cluster] Create collection and create index raise error context deadline exceeded in concurrent dql & ddl scene #38147

Open
1 task done
wangting0128 opened this issue Dec 2, 2024 · 12 comments
Assignees
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@wangting0128
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-20241202-4c623ceb-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): pulsar   
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc124
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: fouramf-bitmap-scenes-jkgsc
test case name: test_bitmap_locust_dql_ddl_cluster

server:

NAME                                                              READY   STATUS        RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
fouramf-bitmap-scenes-jkgsc-4-etcd-0                              1/1     Running       0                4h38m   10.104.33.212   4am-node36   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-etcd-1                              1/1     Running       0                4h38m   10.104.25.41    4am-node30   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-etcd-2                              1/1     Running       0                4h38m   10.104.21.193   4am-node24   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-milvus-datanode-544bc44b99-95vfr    1/1     Running       3 (4h37m ago)    4h38m   10.104.27.247   4am-node31   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-milvus-indexnode-5959955574-5jfz2   1/1     Running       3 (4h37m ago)    4h38m   10.104.18.146   4am-node25   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-milvus-indexnode-5959955574-bc6qv   1/1     Running       3 (4h37m ago)    4h38m   10.104.26.140   4am-node32   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-milvus-mixcoord-74c667ccd5-lg8dz    1/1     Running       3 (4h37m ago)    4h38m   10.104.26.139   4am-node32   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-milvus-proxy-6fd4464c6c-tkdg9       1/1     Running       3 (4h37m ago)    4h38m   10.104.14.254   4am-node18   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-milvus-querynode-fc865745b-bxddw    1/1     Running       3 (4h37m ago)    4h38m   10.104.24.192   4am-node29   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-milvus-querynode-fc865745b-ckbmv    1/1     Running       3 (4h37m ago)    4h38m   10.104.16.10    4am-node21   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-minio-0                             1/1     Running       0                4h38m   10.104.33.205   4am-node36   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-minio-1                             1/1     Running       0                4h38m   10.104.25.37    4am-node30   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-minio-2                             1/1     Running       0                4h38m   10.104.32.31    4am-node39   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-minio-3                             1/1     Running       0                4h38m   10.104.21.189   4am-node24   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-pulsarv3-bookie-0                   1/1     Running       0                4h38m   10.104.32.28    4am-node39   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-pulsarv3-bookie-1                   1/1     Running       0                4h38m   10.104.25.40    4am-node30   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-pulsarv3-bookie-2                   1/1     Running       0                4h38m   10.104.33.213   4am-node36   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-pulsarv3-bookie-init-ksfhr          0/1     Completed     0                4h38m   10.104.14.3     4am-node18   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-pulsarv3-broker-0                   1/1     Running       0                4h38m   10.104.14.4     4am-node18   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-pulsarv3-broker-1                   1/1     Running       0                4h38m   10.104.33.194   4am-node36   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-pulsarv3-proxy-0                    1/1     Running       0                4h38m   10.104.14.5     4am-node18   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-pulsarv3-proxy-1                    1/1     Running       0                4h38m   10.104.21.176   4am-node24   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-pulsarv3-pulsar-init-jtbm2          0/1     Completed     0                4h38m   10.104.14.2     4am-node18   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-pulsarv3-recovery-0                 1/1     Running       0                4h38m   10.104.13.90    4am-node16   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-pulsarv3-zookeeper-0                1/1     Running       0                4h38m   10.104.33.206   4am-node36   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-pulsarv3-zookeeper-1                1/1     Running       0                4h38m   10.104.32.30    4am-node39   <none>           <none>
fouramf-bitmap-scenes-jkgsc-4-pulsarv3-zookeeper-2                1/1     Running       0                4h38m   10.104.21.187   4am-node24   <none>           <none> 

{pod=~"fouramf-bitmap-scenes-jkgsc-4-milvus-.*"} |~ "1d43c1f2684b5c97ed37c36e6b95e766"
截屏2024-12-02 20 00 40

{pod=~"fouramf-bitmap-scenes-jkgsc-4-milvus-.*"} |~ "2e393503a3a6e3ab1c95630c07f4ced2"
截屏2024-12-02 20 03 41

client log:

[2024-12-02 10:36:48,365 - DEBUG - fouram]: (api_request)  : [Collection] args: ['scene_test_TEwO87kD', {'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}], 'enable_dynamic_field': False}, 'default'], kwargs: {'shards_num': 2}, [requestId: 558e5c5c-b099-11ef-b35b-e607a1c4bfe6] (api_request.py:77)
[2024-12-02 10:37:06,711 - ERROR - fouram]: RPC error: [create_collection], <MilvusException: (code=10001, message=context deadline exceeded)>, <Time:{'RPC start': '2024-12-02 10:36:56.701808', 'RPC error': '2024-12-02 10:37:06.711652'}> (decorators.py:140)

[2024-12-02 10:37:39,958 - DEBUG - fouram]: (api_request)  : [Index] args: [<Collection>:
-------------
<name>: scene_test_xq8yNmZi
<description>: 
<schema>: {'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}], 'enable_dynamic_field': False}
, 'float_vector', {'index_type': 'IVF_SQ8', 'metric_type': 'L2', 'params': {'nlist': 2048}}], kwargs: {}, [requestId: 744ed2f2-b099-11ef-b35b-e607a1c4bfe6] (api_request.py:77)
[2024-12-02 10:37:47,102 - ERROR - fouram]: RPC error: [create_index], <MilvusException: (code=65535, message=stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/workspace/source/internal/util/grpcclient/client.go:563 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call
/workspace/source/internal/util/grpcclient/client.go:577 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/workspace/source/internal/distributed/rootcoord/client/client.go:117 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]
/workspace/source/internal/distributed/rootcoord/client/client.go:205 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).describeCollectionInternal
/workspace/source/internal/distributed/rootcoord/client/client.go:211 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).DescribeCollectionInternal
/workspace/source/internal/datacoord/broker/coordinator_broker.go:59 github.com/milvus-io/milvus/internal/datacoord/broker.(*coordinatorBroker).DescribeCollectionInternal
/workspace/source/internal/datacoord/index_service.go:167 github.com/milvus-io/milvus/internal/datacoord.(*Server).getFieldNameByID
/workspace/source/internal/datacoord/index_service.go:204 github.com/milvus-io/milvus/internal/datacoord.(*Server).CreateIndex
/workspace/source/internal/distributed/datacoord/service.go:459 github.com/milvus-io/milvus/internal/distributed/datacoord.(*Server).CreateIndex: rpc error: code = DeadlineExceeded desc = context deadline exceeded)>, <Time:{'RPC start': '2024-12-02 10:37:39.958416', 'RPC error': '2024-12-02 10:37:47.102895'}> (decorators.py:140)

Expected Behavior

No response

Steps To Reproduce

concurrent test and calculation of RT and QPS

        :purpose:  `primary key: INT64`
            1. building `BITMAP` index on all supported 12 scalar fields, `INVERTED` index on pk field
            2. 2 fields of different vector types
            3. verify DQL & DML requests

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim
                'float_vector_1': 768dim
                'id': primary key type is INT64

                all scalar fields: varchar max_length=100, array max_capacity=13
            2. build indexes:
                HNSW: 'float_vector'
                IVF_SQ8: 'float_vector_1'

                BITMAP: all scalar fields
                INVERTED: 'id' prmary key field
            3. insert 10 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - search
                - query
                - hybrid_search 
                - scene_test <- raises error
                    (collection: create->insert->flush->index->drop)
                - scene_search_test
                    (collection: create->insert->flush->index->load->search->drop)
                - scene_hybrid_search_test: 4 vector fields, 3 scalar fields
                    (collection: create->insert->flush->index->load->hybrid_search->drop)

Milvus Log

No response

Anything else?

test result:

[2024-12-02 10:54:15,121 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-12-02 10:54:15,122 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-12-02 10:54:15,122 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-02 10:54:15,122 -  INFO - fouram]: grpc     hybrid_search                                                                   1430     0(0.00%) |   2836    1529   20456   2200 |    0.13        0.00 (stats.py:789)
[2024-12-02 10:54:15,122 -  INFO - fouram]: grpc     query                                                                           1426     0(0.00%) |    218      62   15210     85 |    0.13        0.00 (stats.py:789)
[2024-12-02 10:54:15,122 -  INFO - fouram]: grpc     scene_hybrid_search_test                                                        1406     0(0.00%) |  88980   11053  295281  80000 |    0.13        0.00 (stats.py:789)
[2024-12-02 10:54:15,122 -  INFO - fouram]: grpc     scene_search_test                                                               1527     0(0.00%) |  52275    9014  219368  47000 |    0.14        0.00 (stats.py:789)
[2024-12-02 10:54:15,122 -  INFO - fouram]: grpc     scene_test                                                                      1420     2(0.14%) |  77879   18349  160978  78000 |    0.13        0.00 (stats.py:789)
[2024-12-02 10:54:15,122 -  INFO - fouram]: grpc     search                                                                          1419     0(0.00%) |   1785    1142   14874   1500 |    0.13        0.00 (stats.py:789)
[2024-12-02 10:54:15,122 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-02 10:54:15,122 -  INFO - fouram]:          Aggregated                                                                      8628     2(0.02%) |  37369      62  295281  14000 |    0.80        0.00 (stats.py:789)
[2024-12-02 10:54:15,122 -  INFO - fouram]:  (stats.py:790)
[2024-12-02 10:54:15,130 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'cluster',
            'config_name': 'cluster_8c16m',
            'config': {'queryNode': {'resources': {'limits': {'cpu': '8.0', 'memory': '16Gi'}, 'requests': {'cpu': '5.0', 'memory': '9Gi'}}, 'replicas': 2},
                       'indexNode': {'resources': {'limits': {'cpu': '8.0', 'memory': '8Gi'}, 'requests': {'cpu': '5.0', 'memory': '5Gi'}}, 'replicas': 2},
                       'dataNode': {'resources': {'limits': {'cpu': '8.0', 'memory': '16Gi'}, 'requests': {'cpu': '5.0', 'memory': '9Gi'}}},
                       'cluster': {'enabled': True},
                       'pulsarv3': {},
                       'kafka': {},
                       'minio': {'metrics': {'podMonitor': {'enabled': True}}},
                       'etcd': {'metrics': {'enabled': True, 'podMonitor': {'enabled': True}}},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus', 'tag': 'master-20241202-4c623ceb-amd64'}}},
            'host': 'fouramf-bitmap-scenes-jkgsc-4-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_bitmap_locust_dql_ddl_cluster',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'max_length': 100,
                                                    'scalars_index': {'id': {'index_type': 'INVERTED'},
                                                                      'int8_1': {'index_type': 'BITMAP'},
                                                                      'int16_1': {'index_type': 'BITMAP'},
                                                                      'int32_1': {'index_type': 'BITMAP'},
                                                                      'int64_1': {'index_type': 'BITMAP'},
                                                                      'varchar_1': {'index_type': 'BITMAP'},
                                                                      'bool_1': {'index_type': 'BITMAP'},
                                                                      'array_int8_1': {'index_type': 'BITMAP'},
                                                                      'array_int16_1': {'index_type': 'BITMAP'},
                                                                      'array_int32_1': {'index_type': 'BITMAP'},
                                                                      'array_int64_1': {'index_type': 'BITMAP'},
                                                                      'array_varchar_1': {'index_type': 'BITMAP'},
                                                                      'array_bool_1': {'index_type': 'BITMAP'}},
                                                    'vectors_index': {'float_vector_1': {'index_type': 'IVF_SQ8',
                                                                                         'index_param': {'nlist': 1024},
                                                                                         'metric_type': 'L2'}},
                                                    'scalars_params': {'array_int8_1': {'params': {'max_capacity': 13},
                                                                                        'other_params': {'dataset': 'random_algorithm',
                                                                                                         'algorithm_params': {'algorithm_name': 'random_range',
                                                                                                                              'specify_range': [-128, 128],
                                                                                                                              'max_capacity': 13}}},
                                                                       'array_int16_1': {'params': {'max_capacity': 13},
                                                                                         'other_params': {'dataset': 'random_algorithm',
                                                                                                          'algorithm_params': {'algorithm_name': 'random_range',
                                                                                                                               'specify_range': [-200, 200],
                                                                                                                               'max_capacity': 13}}},
                                                                       'array_int32_1': {'params': {'max_capacity': 13},
                                                                                         'other_params': {'dataset': 'random_algorithm',
                                                                                                          'algorithm_params': {'algorithm_name': 'specify_scope',
                                                                                                                               'specify_range': [-300, 300],
                                                                                                                               'max_capacity': 13}}},
                                                                       'array_int64_1': {'params': {'max_capacity': 13},
                                                                                         'other_params': {'dataset': 'random_algorithm',
                                                                                                          'algorithm_params': {'algorithm_name': 'fixed_value_range',
                                                                                                                               'specify_range': [-400, 432],
                                                                                                                               'batch': 50,
                                                                                                                               'max_capacity': 13}}},
                                                                       'array_varchar_1': {'params': {'max_capacity': 13},
                                                                                           'other_params': {'dataset': 'random_algorithm',
                                                                                                            'algorithm_params': {'algorithm_name': 'random_range',
                                                                                                                                 'specify_range': [-1500, 1500],
                                                                                                                                 'max_capacity': 13}}},
                                                                       'array_bool_1': {'params': {'max_capacity': 13}},
                                                                       'int8_1': {'other_params': {'dataset': 'random_algorithm',
                                                                                                   'algorithm_params': {'algorithm_name': 'random_range',
                                                                                                                        'specify_range': [-128, 128],
                                                                                                                        'max_capacity': 13}}},
                                                                       'int16_1': {'other_params': {'dataset': 'random_algorithm',
                                                                                                    'algorithm_params': {'algorithm_name': 'random_range',
                                                                                                                         'specify_range': [-200, 200],
                                                                                                                         'max_capacity': 13}}},
                                                                       'int32_1': {'other_params': {'dataset': 'random_algorithm',
                                                                                                    'algorithm_params': {'algorithm_name': 'specify_scope',
                                                                                                                         'specify_range': [-300, 300],
                                                                                                                         'max_capacity': 13}}},
                                                                       'int64_1': {'other_params': {'dataset': 'random_algorithm',
                                                                                                    'algorithm_params': {'algorithm_name': 'fixed_value_range',
                                                                                                                         'specify_range': [-400, 432],
                                                                                                                         'batch': 50,
                                                                                                                         'max_capacity': 13}}},
                                                                       'varchar_1': {'other_params': {'dataset': 'random_algorithm',
                                                                                                      'algorithm_params': {'algorithm_name': 'random_range',
                                                                                                                           'specify_range': [-1500, 1500],
                                                                                                                           'max_capacity': 13}}}},
                                                    'dataset_name': 'sift',
                                                    'dataset_size': 10000000,
                                                    'ni_per': 5000},
                                 'collection_params': {'other_fields': ['float_vector_1', 'int8_1', 'int16_1', 'int32_1', 'int64_1', 'varchar_1', 'bool_1',
                                                                        'array_int8_1', 'array_int16_1', 'array_int32_1', 'array_int64_1', 'array_varchar_1',
                                                                        'array_bool_1'],
                                                       'shards_num': 2},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False, 'reset_db': False},
                                 'index_params': {'index_type': 'HNSW', 'index_param': {'M': 8, 'efConstruction': 200}},
                                 'concurrent_params': {'concurrent_number': 30, 'during_time': '3h', 'interval': 20, 'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'search',
                                                       'weight': 1,
                                                       'params': {'nq': 1000,
                                                                  'top_k': 10,
                                                                  'search_param': {'nprobe': 16},
                                                                  'expr': 'int8_1 == 100',
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'output_fields': ['id', 'float_vector', 'int64_1'],
                                                                  'ignore_growing': False,
                                                                  'group_by_field': None,
                                                                  'timeout': 60,
                                                                  'random_data': True,
                                                                  'check_task': 'check_search_output',
                                                                  'check_items': {'nq': 1000}}},
                                                      {'type': 'query',
                                                       'weight': 1,
                                                       'params': {'ids': None,
                                                                  'expr': 'int64_1 > -1',
                                                                  'output_fields': ['*'],
                                                                  'offset': None,
                                                                  'limit': 10,
                                                                  'ignore_growing': False,
                                                                  'partition_names': None,
                                                                  'timeout': 60,
                                                                  'consistency_level': None,
                                                                  'random_data': False,
                                                                  'random_count': 0,
                                                                  'random_range': [0, 1],
                                                                  'field_name': 'id',
                                                                  'field_type': 'int64',
                                                                  'custom_expr': None,
                                                                  'custom_range': [0, 1],
                                                                  'check_task': 'check_query_output',
                                                                  'check_items': {'expect_length': 10}}},
                                                      {'type': 'hybrid_search',
                                                       'weight': 1,
                                                       'params': {'nq': 10,
                                                                  'top_k': 10,
                                                                  'reqs': [{'search_param': {'ef': 32},
                                                                            'anns_field': 'float_vector',
                                                                            'expr': '(array_contains_any(array_int32_1, [0]) || array_contains(array_int64_1, '
                                                                                    '1)) || ((varchar_1 like "1%") and (bool_1 == True))',
                                                                            'top_k': 30},
                                                                           {'search_param': {'nprobe': 64},
                                                                            'anns_field': 'float_vector_1',
                                                                            'expr': 'not (int16_1 == int8_1) && ARRAY_CONTAINS_ANY(array_int64_1, [-1, 0, '
                                                                                    '1])'}],
                                                                  'rerank': {'RRFRanker': []},
                                                                  'output_fields': ['*'],
                                                                  'ignore_growing': False,
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'timeout': 60,
                                                                  'random_data': True,
                                                                  'check_task': 'check_search_output',
                                                                  'check_items': {'output_fields': ['float_vector_1', 'int8_1', 'int16_1', 'int32_1', 'int64_1',
                                                                                                    'varchar_1', 'bool_1', 'array_int8_1', 'array_int16_1',
                                                                                                    'array_int32_1', 'array_int64_1', 'array_varchar_1',
                                                                                                    'array_bool_1', 'id', 'float_vector'],
                                                                                  'nq': 10}}},
                                                      {'type': 'scene_test',
                                                       'weight': 1,
                                                       'params': {'dim': 128,
                                                                  'data_size': 3000,
                                                                  'nb': 3000,
                                                                  'index_type': 'IVF_SQ8',
                                                                  'index_param': {'nlist': 2048},
                                                                  'metric_type': 'L2',
                                                                  'other_fields': [],
                                                                  'scalars_params': {},
                                                                  'scalars_index': {},
                                                                  'vectors_index': {}}},
                                                      {'type': 'scene_search_test',
                                                       'weight': 1,
                                                       'params': {'dataset': 'local',
                                                                  'dim': 128,
                                                                  'shards_num': 2,
                                                                  'data_size': 3000,
                                                                  'nb': 3000,
                                                                  'index_type': 'IVF_SQ8',
                                                                  'index_param': {'nlist': 2048},
                                                                  'metric_type': 'L2',
                                                                  'other_fields': ['array_int64_1', 'array_bool_1', 'array_varchar_1'],
                                                                  'replica_number': 1,
                                                                  'nq': 1,
                                                                  'top_k': 10,
                                                                  'search_param': {'nprobe': 16},
                                                                  'search_counts': 10,
                                                                  'scalars_params': {},
                                                                  'scalars_index': {'array_int64_1': {'index_type': 'BITMAP'},
                                                                                    'array_bool_1': {'index_type': 'BITMAP'},
                                                                                    'array_varchar_1': {'index_type': 'BITMAP'}},
                                                                  'vectors_index': {},
                                                                  'prepare_before_insert': False,
                                                                  'new_connect': False,
                                                                  'new_user': False}},
                                                      {'type': 'scene_hybrid_search_test',
                                                       'weight': 1,
                                                       'params': {'nq': 1,
                                                                  'top_k': 1,
                                                                  'reqs': [{'search_param': {'nprobe': 128},
                                                                            'anns_field': 'float_vector',
                                                                            'expr': 'bool_1 == True',
                                                                            'top_k': 100},
                                                                           {'search_param': {'nprobe': 32},
                                                                            'anns_field': 'binary_vector_scene_hybrid_search_test_1',
                                                                            'expr': 'bool_1 != True',
                                                                            'top_k': 10},
                                                                           {'search_param': {'search_list': 30},
                                                                            'anns_field': 'float16_vector_scene_hybrid_search_test_2',
                                                                            'expr': 'int64_1 >= 1500',
                                                                            'top_k': 5},
                                                                           {'search_param': {'drop_ratio_search': 0.1},
                                                                            'anns_field': 'sparse_float_vector_scene_hybrid_search_test_3',
                                                                            'expr': 'varchar_1 like "1%"',
                                                                            'top_k': 10}],
                                                                  'rerank': {'RRFRanker': []},
                                                                  'output_fields': ['*'],
                                                                  'ignore_growing': False,
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'timeout': 600,
                                                                  'random_data': True,
                                                                  'dataset': 'local',
                                                                  'dim': 128,
                                                                  'shards_num': 2,
                                                                  'data_size': 3000,
                                                                  'nb': 3000,
                                                                  'index_type': 'IVF_SQ8',
                                                                  'index_param': {'nlist': 2048},
                                                                  'metric_type': 'L2',
                                                                  'other_fields': ['binary_vector_scene_hybrid_search_test_1',
                                                                                   'float16_vector_scene_hybrid_search_test_2',
                                                                                   'sparse_float_vector_scene_hybrid_search_test_3', 'int64_1', 'bool_1',
                                                                                   'varchar_1'],
                                                                  'replica_number': 1,
                                                                  'scalars_params': {'binary_vector_scene_hybrid_search_test_1': {'params': {'dim': 512},
                                                                                                                                  'other_params': {'dataset': 'binary'}},
                                                                                     'float16_vector_scene_hybrid_search_test_2': {'params': {'dim': 64}}},
                                                                  'scalars_index': {'int64_1': {},
                                                                                    'bool_1': {'index_type': 'BITMAP'},
                                                                                    'varchar_1': {'index_type': 'BITMAP'}},
                                                                  'vectors_index': {'binary_vector_scene_hybrid_search_test_1': {'index_type': 'BIN_IVF_FLAT',
                                                                                                                                 'index_param': {'nlist': 2048},
                                                                                                                                 'metric_type': 'JACCARD'},
                                                                                    'float16_vector_scene_hybrid_search_test_2': {'index_type': 'DISKANN',
                                                                                                                                  'index_param': {},
                                                                                                                                  'metric_type': 'IP'},
                                                                                    'sparse_float_vector_scene_hybrid_search_test_3': {'index_type': 'SPARSE_WAND',
                                                                                                                                       'index_param': {'drop_ratio_build': 0.2},
                                                                                                                                       'metric_type': 'IP'}},
                                                                  'prepare_before_insert': False,
                                                                  'hybrid_search_counts': 10,
                                                                  'new_connect': False,
                                                                  'new_user': False}}]},
            'run_id': 2024120202088983,
            'datetime': '2024-12-02 06:16:48.567055',
            'client_version': '2.5.0'},
 'result': {'test_result': {'index': {'RT': 2374.6547,
                                      'float_vector_1': {'RT': 1120.1665},
                                      'id': {'RT': 686.9565},
                                      'int8_1': {'RT': 206.0543},
                                      'int16_1': {'RT': 15.9358},
                                      'int32_1': {'RT': 1.0517},
                                      'int64_1': {'RT': 1.0327},
                                      'varchar_1': {'RT': 8.8019},
                                      'bool_1': {'RT': 0.5225},
                                      'array_int8_1': {'RT': 0.5635},
                                      'array_int16_1': {'RT': 0.5224},
                                      'array_int32_1': {'RT': 0.5284},
                                      'array_int64_1': {'RT': 0.533},
                                      'array_varchar_1': {'RT': 0.5214},
                                      'array_bool_1': {'RT': 0.5326}},
                            'insert': {'total_time': 895.4146, 'VPS': 11168.0109, 'batch_time': 0.4477, 'batch': 5000},
                            'flush': {'RT': 3.0265},
                            'load': {'RT': 10.6694},
                            'Locust': {'Aggregated': {'Requests': 8628,
                                                      'Fails': 2,
                                                      'RPS': 0.8,
                                                      'fail_s': 0.0,
                                                      'RT_max': 295281.85,
                                                      'RT_avg': 37369.38,
                                                      'TP50': 14000.0,
                                                      'TP99': 166000.0},
                                       'hybrid_search': {'Requests': 1430,
                                                         'Fails': 0,
                                                         'RPS': 0.13,
                                                         'fail_s': 0.0,
                                                         'RT_max': 20456.95,
                                                         'RT_avg': 2836.71,
                                                         'TP50': 2200.0,
                                                         'TP99': 8700.0},
                                       'query': {'Requests': 1426,
                                                 'Fails': 0,
                                                 'RPS': 0.13,
                                                 'fail_s': 0.0,
                                                 'RT_max': 15210.24,
                                                 'RT_avg': 218.54,
                                                 'TP50': 85,
                                                 'TP99': 2000.0},
                                       'scene_hybrid_search_test': {'Requests': 1406,
                                                                    'Fails': 0,
                                                                    'RPS': 0.13,
                                                                    'fail_s': 0.0,
                                                                    'RT_max': 295281.85,
                                                                    'RT_avg': 88980.68,
                                                                    'TP50': 80000.0,
                                                                    'TP99': 228000.0},
                                       'scene_search_test': {'Requests': 1527,
                                                             'Fails': 0,
                                                             'RPS': 0.14,
                                                             'fail_s': 0.0,
                                                             'RT_max': 219368.09,
                                                             'RT_avg': 52275.85,
                                                             'TP50': 47000.0,
                                                             'TP99': 144000.0},
                                       'scene_test': {'Requests': 1420,
                                                      'Fails': 2,
                                                      'RPS': 0.13,
                                                      'fail_s': 0.0,
                                                      'RT_max': 160978.74,
                                                      'RT_avg': 77879.42,
                                                      'TP50': 78000.0,
                                                      'TP99': 97000.0},
                                       'search': {'Requests': 1419,
                                                  'Fails': 0,
                                                  'RPS': 0.13,
                                                  'fail_s': 0.0,
                                                  'RT_max': 14874.11,
                                                  'RT_avg': 1785.78,
                                                  'TP50': 1500.0,
                                                  'TP99': 4700.0}}}}}
@wangting0128 wangting0128 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. test/benchmark benchmark test labels Dec 2, 2024
@wangting0128 wangting0128 added this to the 2.5.0 milestone Dec 2, 2024
@yanliang567
Copy link
Contributor

/assign @liliu-z
/unassign

@sre-ci-robot sre-ci-robot assigned liliu-z and unassigned yanliang567 Dec 3, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 3, 2024
@wangting0128
Copy link
Contributor Author

different case,same error

argo task:multi-vector-corn-1-1733407200
test case name:test_hybrid_search_locust_dql_dml_partition_hybrid_search_cluster
image:master-20241205-6ff19481-amd64

server:

NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-corn-1-1733407200-1-etcd-0                           1/1     Running     0               3h7m    10.104.25.71    4am-node30   <none>           <none>
multi-vector-corn-1-1733407200-1-etcd-1                           1/1     Running     0               3h7m    10.104.17.59    4am-node23   <none>           <none>
multi-vector-corn-1-1733407200-1-etcd-2                           1/1     Running     0               3h7m    10.104.26.186   4am-node32   <none>           <none>
multi-vector-corn-1-1733407200-1-milvus-datanode-cf885778czmz2r   1/1     Running     2 (3h6m ago)    3h7m    10.104.33.71    4am-node36   <none>           <none>
multi-vector-corn-1-1733407200-1-milvus-indexnode-689684745hgwk   1/1     Running     2 (3h6m ago)    3h7m    10.104.27.95    4am-node31   <none>           <none>
multi-vector-corn-1-1733407200-1-milvus-indexnode-68968474dstdc   1/1     Running     2 (3h6m ago)    3h7m    10.104.16.194   4am-node21   <none>           <none>
multi-vector-corn-1-1733407200-1-milvus-indexnode-68968474j4ltd   1/1     Running     2 (3h6m ago)    3h7m    10.104.34.72    4am-node37   <none>           <none>
multi-vector-corn-1-1733407200-1-milvus-indexnode-68968474k89hn   1/1     Running     2 (3h6m ago)    3h7m    10.104.23.181   4am-node27   <none>           <none>
multi-vector-corn-1-1733407200-1-milvus-mixcoord-7b7c6c945xljxh   1/1     Running     2 (3h6m ago)    3h7m    10.104.33.72    4am-node36   <none>           <none>
multi-vector-corn-1-1733407200-1-milvus-proxy-5c785456bc-gw7hs    1/1     Running     2 (3h6m ago)    3h7m    10.104.33.73    4am-node36   <none>           <none>
multi-vector-corn-1-1733407200-1-milvus-querynode-7d6c87df796qv   1/1     Running     2 (3h6m ago)    3h7m    10.104.27.96    4am-node31   <none>           <none>
multi-vector-corn-1-1733407200-1-milvus-querynode-7d6c87df8brdw   1/1     Running     2 (3h6m ago)    3h7m    10.104.33.74    4am-node36   <none>           <none>
multi-vector-corn-1-1733407200-1-minio-0                          1/1     Running     0               3h7m    10.104.25.70    4am-node30   <none>           <none>
multi-vector-corn-1-1733407200-1-minio-1                          1/1     Running     0               3h7m    10.104.17.58    4am-node23   <none>           <none>
multi-vector-corn-1-1733407200-1-minio-2                          1/1     Running     0               3h7m    10.104.26.185   4am-node32   <none>           <none>
multi-vector-corn-1-1733407200-1-minio-3                          1/1     Running     0               3h7m    10.104.19.190   4am-node28   <none>           <none>
multi-vector-corn-1-1733407200-1-pulsarv3-bookie-0                1/1     Running     0               3h7m    10.104.25.73    4am-node30   <none>           <none>
multi-vector-corn-1-1733407200-1-pulsarv3-bookie-1                1/1     Running     0               3h7m    10.104.26.184   4am-node32   <none>           <none>
multi-vector-corn-1-1733407200-1-pulsarv3-bookie-2                1/1     Running     0               3h7m    10.104.17.63    4am-node23   <none>           <none>
multi-vector-corn-1-1733407200-1-pulsarv3-bookie-init-hvjcc       0/1     Completed   0               3h7m    10.104.25.61    4am-node30   <none>           <none>
multi-vector-corn-1-1733407200-1-pulsarv3-broker-0                1/1     Running     0               3h7m    10.104.25.60    4am-node30   <none>           <none>
multi-vector-corn-1-1733407200-1-pulsarv3-broker-1                1/1     Running     0               3h7m    10.104.17.54    4am-node23   <none>           <none>
multi-vector-corn-1-1733407200-1-pulsarv3-proxy-0                 1/1     Running     0               3h7m    10.104.25.63    4am-node30   <none>           <none>
multi-vector-corn-1-1733407200-1-pulsarv3-proxy-1                 1/1     Running     0               3h7m    10.104.14.171   4am-node18   <none>           <none>
multi-vector-corn-1-1733407200-1-pulsarv3-pulsar-init-bjp7d       0/1     Completed   0               3h7m    10.104.25.62    4am-node30   <none>           <none>
multi-vector-corn-1-1733407200-1-pulsarv3-recovery-0              1/1     Running     0               3h7m    10.104.25.64    4am-node30   <none>           <none>
multi-vector-corn-1-1733407200-1-pulsarv3-zookeeper-0             1/1     Running     0               3h7m    10.104.25.72    4am-node30   <none>           <none>
multi-vector-corn-1-1733407200-1-pulsarv3-zookeeper-1             1/1     Running     0               3h7m    10.104.17.60    4am-node23   <none>           <none>
multi-vector-corn-1-1733407200-1-pulsarv3-zookeeper-2             1/1     Running     0               3h7m    10.104.26.181   4am-node32   <none>           <none>

client logs:

[2024-12-05 17:35:29,029 - ERROR - fouram]: RPC error: [get_flush_state], <MilvusException: (code=65535, message=stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/workspace/source/internal/util/grpcclient/client.go:575 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call
/workspace/source/internal/util/grpcclient/client.go:589 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/workspace/source/internal/distributed/rootcoord/client/client.go:118 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]
/workspace/source/internal/distributed/rootcoord/client/client.go:194 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).DescribeCollection
/workspace/source/internal/proxy/meta_cache.go:718 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).describeCollection
/workspace/source/internal/proxy/meta_cache.go:414 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).update
/workspace/source/internal/proxy/meta_cache.go:513 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).UpdateByName.func1
/workspace/source/pkg/util/conc/singleflight.go:19 github.com/milvus-io/milvus/pkg/util/conc.(*Singleflight[...]).Do.func1
/go/pkg/mod/golang.org/x/[email protected]/singleflight/singleflight.go:198 golang.org/x/sync/singleflight.(*Group).doCall.func2: rpc error: code = Canceled desc = context canceled)>, <Time:{'RPC start': '2024-12-05 17:25:46.454556', 'RPC error': '2024-12-05 17:35:29.029033'}> (decorators.py:140)
[2024-12-05 17:35:29,029 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=Retry timeout: 600s, message=stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/workspace/source/internal/util/grpcclient/client.go:575 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call
/workspace/source/internal/util/grpcclient/client.go:589 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/workspace/source/internal/distributed/rootcoord/client/client.go:118 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]
/workspace/source/internal/distributed/rootcoord/client/client.go:194 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).DescribeCollection
/workspace/source/internal/proxy/meta_cache.go:718 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).describeCollection
/workspace/source/internal/proxy/meta_cache.go:414 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).update
/workspace/source/internal/proxy/meta_cache.go:513 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).UpdateByName.func1
/workspace/source/pkg/util/conc/singleflight.go:19 github.com/milvus-io/milvus/pkg/util/conc.(*Singleflight[...]).Do.func1
/go/pkg/mod/golang.org/x/[email protected]/singleflight/singleflight.go:198 golang.org/x/sync/singleflight.(*Group).doCall.func2: rpc error: code = Canceled desc = context canceled)>, <Time:{'RPC start': '2024-12-05 17:22:25.370987', 'RPC error': '2024-12-05 17:35:29.029919'}> (decorators.py:140)
[2024-12-05 17:35:29,030 - ERROR - fouram]: (api_response) : [Partition.flush] <MilvusException: (code=65535, message=Retry timeout: 600s, message=stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/workspace/source/internal/util/grpcclient/client.go:575 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call
/workspace/source/internal/util/grpcclient/client.go:589 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/workspace/source/internal/distributed/rootcoord/client/client.go:118 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]
/workspace/source/internal/distributed/rootcoord/client/client.go:194 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).DescribeCollection
/workspace/source/internal/proxy/meta_cache.go:718 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).describeCollection
/workspace/source/internal/proxy/meta_cache.go:414 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).update
/workspace/source/internal/proxy/meta_cache.go:513 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).UpdateByName.func1
/workspace/source/pkg/util/conc/singleflight.go:19 github.com/milvus-io/milvus/pkg/util/conc.(*Singleflight[...]).Do.func1
/go/pkg/mod/golang.org/x/[email protected]/singleflight/singleflight.go:198 golang.org/x/sync/singleflight.(*Group).doCall.func2: rpc error: code = Canceled desc = context canceled)>, [requestId: 7ec7f6e6-b32d-11ef-9132-0e974cf1deeb] (api_request.py:57)


[2024-12-05 17:35:46,276 - ERROR - fouram]: RPC error: [list_indexes], <MilvusException: (code=65535, message=stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/workspace/source/internal/util/grpcclient/client.go:575 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call
/workspace/source/internal/util/grpcclient/client.go:589 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/workspace/source/internal/distributed/rootcoord/client/client.go:118 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]
/workspace/source/internal/distributed/rootcoord/client/client.go:194 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).DescribeCollection
/workspace/source/internal/proxy/meta_cache.go:718 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).describeCollection
/workspace/source/internal/proxy/meta_cache.go:414 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).update
/workspace/source/internal/proxy/meta_cache.go:513 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).UpdateByName.func1
/workspace/source/pkg/util/conc/singleflight.go:19 github.com/milvus-io/milvus/pkg/util/conc.(*Singleflight[...]).Do.func1
/go/pkg/mod/golang.org/x/[email protected]/singleflight/singleflight.go:198 golang.org/x/sync/singleflight.(*Group).doCall.func2: rpc error: code = Canceled desc = context canceled)>, <Time:{'RPC start': '2024-12-05 17:25:46.443684', 'RPC error': '2024-12-05 17:35:46.276819'}> (decorators.py:140)
[2024-12-05 17:35:46,277 - ERROR - fouram]: [func_time_catch] : <MilvusException: (code=65535, message=stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/workspace/source/internal/util/grpcclient/client.go:575 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call
/workspace/source/internal/util/grpcclient/client.go:589 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/workspace/source/internal/distributed/rootcoord/client/client.go:118 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]
/workspace/source/internal/distributed/rootcoord/client/client.go:194 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).DescribeCollection
/workspace/source/internal/proxy/meta_cache.go:718 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).describeCollection
/workspace/source/internal/proxy/meta_cache.go:414 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).update
/workspace/source/internal/proxy/meta_cache.go:513 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).UpdateByName.func1
/workspace/source/pkg/util/conc/singleflight.go:19 github.com/milvus-io/milvus/pkg/util/conc.(*Singleflight[...]).Do.func1
/go/pkg/mod/golang.org/x/[email protected]/singleflight/singleflight.go:198 golang.org/x/sync/singleflight.(*Group).doCall.func2: rpc error: code = Canceled desc = context canceled)> (api_request.py:127)

[2024-12-05 17:35:46,692 - ERROR - fouram]: RPC error: [get_index_state], <MilvusException: (code=65535, message=stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/workspace/source/internal/util/grpcclient/client.go:575 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call
/workspace/source/internal/util/grpcclient/client.go:589 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/workspace/source/internal/distributed/rootcoord/client/client.go:118 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]
/workspace/source/internal/distributed/rootcoord/client/client.go:194 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).DescribeCollection
/workspace/source/internal/proxy/meta_cache.go:718 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).describeCollection
/workspace/source/internal/proxy/meta_cache.go:414 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).update
/workspace/source/internal/proxy/meta_cache.go:513 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).UpdateByName.func1
/workspace/source/pkg/util/conc/singleflight.go:19 github.com/milvus-io/milvus/pkg/util/conc.(*Singleflight[...]).Do.func1
/go/pkg/mod/golang.org/x/[email protected]/singleflight/singleflight.go:198 golang.org/x/sync/singleflight.(*Group).doCall.func2: rpc error: code = Canceled desc = context canceled)>, <Time:{'RPC start': '2024-12-05 17:25:46.941382', 'RPC error': '2024-12-05 17:35:46.692833'}> (decorators.py:140)
[2024-12-05 17:35:46,693 - ERROR - fouram]: RPC error: [wait_for_creating_index], <MilvusException: (code=65535, message=stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/workspace/source/internal/util/grpcclient/client.go:575 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call
/workspace/source/internal/util/grpcclient/client.go:589 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/workspace/source/internal/distributed/rootcoord/client/client.go:118 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]
/workspace/source/internal/distributed/rootcoord/client/client.go:194 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).DescribeCollection
/workspace/source/internal/proxy/meta_cache.go:718 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).describeCollection
/workspace/source/internal/proxy/meta_cache.go:414 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).update
/workspace/source/internal/proxy/meta_cache.go:513 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).UpdateByName.func1
/workspace/source/pkg/util/conc/singleflight.go:19 github.com/milvus-io/milvus/pkg/util/conc.(*Singleflight[...]).Do.func1
/go/pkg/mod/golang.org/x/[email protected]/singleflight/singleflight.go:198 golang.org/x/sync/singleflight.(*Group).doCall.func2: rpc error: code = Canceled desc = context canceled)>, <Time:{'RPC start': '2024-12-05 17:25:46.196234', 'RPC error': '2024-12-05 17:35:46.693056'}> (decorators.py:140)
[2024-12-05 17:35:46,693 - ERROR - fouram]: RPC error: [create_index], <MilvusException: (code=65535, message=stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/workspace/source/internal/util/grpcclient/client.go:575 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call
/workspace/source/internal/util/grpcclient/client.go:589 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/workspace/source/internal/distributed/rootcoord/client/client.go:118 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]
/workspace/source/internal/distributed/rootcoord/client/client.go:194 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).DescribeCollection
/workspace/source/internal/proxy/meta_cache.go:718 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).describeCollection
/workspace/source/internal/proxy/meta_cache.go:414 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).update
/workspace/source/internal/proxy/meta_cache.go:513 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).UpdateByName.func1
/workspace/source/pkg/util/conc/singleflight.go:19 github.com/milvus-io/milvus/pkg/util/conc.(*Singleflight[...]).Do.func1
/go/pkg/mod/golang.org/x/[email protected]/singleflight/singleflight.go:198 golang.org/x/sync/singleflight.(*Group).doCall.func2: rpc error: code = Canceled desc = context canceled)>, <Time:{'RPC start': '2024-12-05 17:25:01.758057', 'RPC error': '2024-12-05 17:35:46.693155'}> (decorators.py:140)
[2024-12-05 17:35:46,693 - ERROR - fouram]: (api_response) : [Index] <MilvusException: (code=65535, message=stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/workspace/source/internal/util/grpcclient/client.go:575 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call
/workspace/source/internal/util/grpcclient/client.go:589 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/workspace/source/internal/distributed/rootcoord/client/client.go:118 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]
/workspace/source/internal/distributed/rootcoord/client/client.go:194 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).DescribeCollection
/workspace/source/internal/proxy/meta_cache.go:718 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).describeCollection
/workspace/source/internal/proxy/meta_cache.go:414 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).update
/workspace/source/internal/proxy/meta_cache.go:513 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).UpdateByName.func1
/workspace/source/pkg/util/conc/singleflight.go:19 github.com/milvus-io/milvus/pkg/util/conc.(*Singleflight[...]).Do.func1
/go/pkg/mod/golang.org/x/[email protected]/singleflight/singleflight.go:198 golang.org/x/sync/singleflight.(*Group).doCall.func2: rpc error: code = Canceled desc = context canceled)>, [requestId: dbfeb7aa-b32d-11ef-9132-0e974cf1deeb] (api_request.py:57)

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `DQL & DML(partition)`
            verify concurrent DQL & DML(partition) scenario,
            which has 4 vector fields(IVF_FLAT, HNSW, DISKANN, IVF_SQ8) and scalar fields: `int64_1`, `varchar_1`

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'float_vector_1': 128dim,
                'float_vector_2': 128dim,
                'float_vector_3': 128dim,
                scalar field: int64_1, varchar_1
            2. build indexes:
                IVF_FLAT: 'float_vector'
                HNSW: 'float_vector_1',
                DISKANN: 'float_vector_2'
                IVF_SQ8: 'float_vector_3'
                INVERTED: 'int64_1', 'varchar_1'
                default scalar index: 'id'
            3. insert 1 million data into 10 partitions
            4. flush collection
            5. build indexes again using the same params
            6. load collection
                replica: 1
            7. concurrent request:
                - scene_test_partition_hybrid_search <- build index failed
                    (partition: create->insert->flush->index again->load->hybrid_search->release->hybrid_search failed->drop)
                - search
                - hybrid_search
                - query

test result:

[2024-12-05 20:19:45,591 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-12-05 20:19:45,591 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-12-05 20:19:45,591 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-05 20:19:45,591 -  INFO - fouram]: grpc     hybrid_search                                                                    167   62(37.13%) | 228636     523  600096  18000 |    0.02        0.01 (stats.py:789)
[2024-12-05 20:19:45,591 -  INFO - fouram]: grpc     query                                                                             23    8(34.78%) | 223028     189  600089  42000 |    0.00        0.00 (stats.py:789)
[2024-12-05 20:19:45,591 -  INFO - fouram]: grpc     scene_test_partition_hybrid_search                                                10    5(50.00%) | 511715  149131  872837 254000 |    0.00        0.00 (stats.py:789)
[2024-12-05 20:19:45,591 -  INFO - fouram]: grpc     search                                                                           185   77(41.62%) | 264940   10910  600589  37000 |    0.02        0.01 (stats.py:789)
[2024-12-05 20:19:45,592 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-05 20:19:45,592 -  INFO - fouram]:          Aggregated                                                                       385  152(39.48%) | 253099     189  872837  31000 |    0.04        0.01 (stats.py:789)
[2024-12-05 20:19:45,592 -  INFO - fouram]:  (stats.py:790)
[2024-12-05 20:19:45,596 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'cluster',
            'config_name': 'cluster_2c8m',
            'config': {'queryNode': {'resources': {'limits': {'cpu': '32.0', 'memory': '32Gi'}, 'requests': {'cpu': '17.0', 'memory': '17Gi'}}, 'replicas': 2},
                       'indexNode': {'resources': {'limits': {'cpu': '8.0', 'memory': '8Gi'}, 'requests': {'cpu': '5.0', 'memory': '5Gi'}}, 'replicas': 4},
                       'dataNode': {'resources': {'limits': {'cpu': '2.0', 'memory': '8Gi'}, 'requests': {'cpu': '2.0', 'memory': '5Gi'}}},
                       'cluster': {'enabled': True},
                       'pulsarv3': {},
                       'kafka': {},
                       'minio': {'metrics': {'podMonitor': {'enabled': True}}},
                       'etcd': {'metrics': {'enabled': True, 'podMonitor': {'enabled': True}}},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus', 'tag': 'master-20241205-6ff19481-amd64'}}},
            'host': 'multi-vector-corn-1-1733407200-1-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_hybrid_search_locust_dql_dml_partition_hybrid_search_cluster',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'scalars_index': {'id': {}, 'int64_1': {'index_type': 'INVERTED'}, 'varchar_1': {'index_type': 'INVERTED'}},
                                                    'vectors_index': {'float_vector_1': {'index_type': 'HNSW',
                                                                                         'index_param': {'M': 8, 'efConstruction': 200},
                                                                                         'metric_type': 'L2'},
                                                                      'float_vector_2': {'index_type': 'DISKANN', 'index_param': {}, 'metric_type': 'IP'},
                                                                      'float_vector_3': {'index_type': 'IVF_SQ8',
                                                                                         'index_param': {'nlist': 2048},
                                                                                         'metric_type': 'L2'}},
                                                    'scalars_params': {'float_vector_1': {'params': {'dim': 128}, 'other_params': {'dataset': 'sift'}},
                                                                       'float_vector_2': {'params': {'dim': 128}, 'other_params': {'dataset': 'sift'}},
                                                                       'float_vector_3': {'params': {'dim': 128}, 'other_params': {'dataset': 'sift'}}},
                                                    'extra_partitions': {'partitions': ['_default', 'partition_1', 'partition_2', 'partition_3', 'partition_4',
                                                                                        'partition_5', 'partition_6', 'partition_7', 'partition_8',
                                                                                        'partition_9'],
                                                                         'data_repeated': False},
                                                    'dataset_name': 'sift',
                                                    'dataset_size': 1000000,
                                                    'ni_per': 10000},
                                 'collection_params': {'other_fields': ['float_vector_1', 'float_vector_2', 'float_vector_3', 'int64_1', 'varchar_1'],
                                                       'shards_num': 2},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False, 'reset_db': False},
                                 'index_params': {'index_type': 'IVF_FLAT', 'index_param': {'nlist': 1024}},
                                 'concurrent_params': {'concurrent_number': 20, 'during_time': '3h', 'interval': 20, 'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'scene_test_partition_hybrid_search',
                                                       'weight': 1,
                                                       'params': {'nq': 1,
                                                                  'top_k': 1,
                                                                  'reqs': [{'search_param': {'nprobe': 128}, 'anns_field': 'float_vector', 'top_k': 100},
                                                                           {'search_param': {'ef': 64}, 'anns_field': 'float_vector_1', 'top_k': 10},
                                                                           {'search_param': {'search_list': 32}, 'anns_field': 'float_vector_2', 'top_k': 30},
                                                                           {'search_param': {'nprobe': 16}, 'anns_field': 'float_vector_3', 'top_k': 400}],
                                                                  'rerank': {'RRFRanker': []},
                                                                  'output_fields': ['*'],
                                                                  'ignore_growing': False,
                                                                  'guarantee_timestamp': None,
                                                                  'timeout': 600,
                                                                  'random_data': True,
                                                                  'hybrid_search_counts': 1,
                                                                  'data_size': 3000,
                                                                  'ni': 3000}},
                                                      {'type': 'search',
                                                       'weight': 8,
                                                       'params': {'nq': 1000,
                                                                  'top_k': 1,
                                                                  'search_param': {'nprobe': 1000},
                                                                  'expr': 'int64_1 >= 0',
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': ['_default', 'partition_1', 'partition_2', 'partition_3', 'partition_4',
                                                                                      'partition_5', 'partition_6', 'partition_7', 'partition_8',
                                                                                      'partition_9'],
                                                                  'output_fields': None,
                                                                  'ignore_growing': False,
                                                                  'group_by_field': None,
                                                                  'timeout': 600,
                                                                  'random_data': True,
                                                                  'check_task': 'check_response',
                                                                  'check_items': None}},
                                                      {'type': 'hybrid_search',
                                                       'weight': 8,
                                                       'params': {'nq': 1,
                                                                  'top_k': 100,
                                                                  'reqs': [{'search_param': {'nprobe': 128}, 'anns_field': 'float_vector', 'top_k': 100},
                                                                           {'search_param': {'ef': 64}, 'anns_field': 'float_vector_1', 'top_k': 10},
                                                                           {'search_param': {'search_list': 32}, 'anns_field': 'float_vector_2', 'top_k': 30},
                                                                           {'search_param': {'nprobe': 16}, 'anns_field': 'float_vector_3', 'top_k': 400}],
                                                                  'rerank': {'WeightedRanker': [0.85, 0.95, 0.51, 0.32]},
                                                                  'output_fields': ['*'],
                                                                  'ignore_growing': False,
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': ['_default', 'partition_1', 'partition_2', 'partition_3', 'partition_4',
                                                                                      'partition_5', 'partition_6', 'partition_7', 'partition_8',
                                                                                      'partition_9'],
                                                                  'timeout': 600,
                                                                  'random_data': True,
                                                                  'check_task': 'check_response',
                                                                  'check_items': None}},
                                                      {'type': 'query',
                                                       'weight': 1,
                                                       'params': {'ids': None,
                                                                  'expr': 'int64_1 > -1 && ',
                                                                  'output_fields': ['*'],
                                                                  'offset': None,
                                                                  'limit': None,
                                                                  'ignore_growing': False,
                                                                  'partition_names': ['_default', 'partition_1', 'partition_2', 'partition_3', 'partition_4',
                                                                                      'partition_5', 'partition_6', 'partition_7', 'partition_8',
                                                                                      'partition_9'],
                                                                  'timeout': 600,
                                                                  'consistency_level': None,
                                                                  'random_data': True,
                                                                  'random_count': 20,
                                                                  'random_range': [0, 100000],
                                                                  'field_name': 'id',
                                                                  'field_type': 'int64',
                                                                  'custom_expr': None,
                                                                  'custom_range': [0, 1],
                                                                  'check_task': 'check_response',
                                                                  'check_items': None}}]},
            'run_id': 2024120587899013,
            'datetime': '2024-12-05 17:13:09.114617',
            'client_version': '2.5.0'},
 'result': {'test_result': {'index': {'RT': 21.2388,
                                      'float_vector_1': {'RT': 1.5228},
                                      'float_vector_2': {'RT': 4.0772},
                                      'float_vector_3': {'RT': 0.5143},
                                      'id': {'RT': 0.5136},
                                      'int64_1': {'RT': 0.5149},
                                      'varchar_1': {'RT': 0.5149}},
                            'insert': {'total_time': 144.1291, 'VPS': 6945.7525, 'batch_time': 1.4413, 'batch': 10000.0},
                            'flush': {'RT': 3.0294},
                            'load': {'RT': 3.2556},
                            'Locust': {'Aggregated': {'Requests': 385,
                                                      'Fails': 152,
                                                      'RPS': 0.04,
                                                      'fail_s': 0.39,
                                                      'RT_max': 872837.64,
                                                      'RT_avg': 253099.12,
                                                      'TP50': 31000.0,
                                                      'TP99': 822000.0},
                                       'hybrid_search': {'Requests': 167,
                                                         'Fails': 62,
                                                         'RPS': 0.02,
                                                         'fail_s': 0.37,
                                                         'RT_max': 600096.62,
                                                         'RT_avg': 228636.61,
                                                         'TP50': 18000.0,
                                                         'TP99': 600000.0},
                                       'query': {'Requests': 23,
                                                 'Fails': 8,
                                                 'RPS': 0.0,
                                                 'fail_s': 0.35,
                                                 'RT_max': 600089.24,
                                                 'RT_avg': 223028.47,
                                                 'TP50': 42000.0,
                                                 'TP99': 600000.0},
                                       'scene_test_partition_hybrid_search': {'Requests': 10,
                                                                              'Fails': 5,
                                                                              'RPS': 0.0,
                                                                              'fail_s': 0.5,
                                                                              'RT_max': 872837.64,
                                                                              'RT_avg': 511715.31,
                                                                              'TP50': 680000.0,
                                                                              'TP99': 873000.0},
                                       'search': {'Requests': 185,
                                                  'Fails': 77,
                                                  'RPS': 0.02,
                                                  'fail_s': 0.42,
                                                  'RT_max': 600589.74,
                                                  'RT_avg': 264940.76,
                                                  'TP50': 37000.0,
                                                  'TP99': 600000.0}}}}}

@xiaofan-luan
Copy link
Collaborator

@wangting0128
please use the latest master and retest.
if the issue still exist, we need an pprof, and @SimFG could help

@SimFG
Copy link
Contributor

SimFG commented Dec 9, 2024

@xiaofan-luan ok, got it

@wangting0128
Copy link
Contributor Author

@wangting0128 please use the latest master and retest. if the issue still exist, we need an pprof, and @SimFG could help

Got it
This is an occasional problem that occurs occasionally, I'll see if it recurs in recent testing.

@wangting0128
Copy link
Contributor Author

Drop collection context deadline exceeded

argo task: multi-vector-corn-1-1733925600
test case name: test_hybrid_search_locust_ddl_dql_cluster
image: master-20241211-304cdc77-amd64

server:

NAME                                                              READY   STATUS      RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-corn-1-1733925600-5-etcd-0                           1/1     Running     0                12h     10.104.19.193   4am-node28   <none>           <none>
multi-vector-corn-1-1733925600-5-etcd-1                           1/1     Running     0                12h     10.104.24.248   4am-node29   <none>           <none>
multi-vector-corn-1-1733925600-5-etcd-2                           1/1     Running     0                12h     10.104.18.77    4am-node25   <none>           <none>
multi-vector-corn-1-1733925600-5-milvus-datanode-5876685f6x7qvs   1/1     Running     5 (12h ago)      12h     10.104.16.164   4am-node21   <none>           <none>
multi-vector-corn-1-1733925600-5-milvus-indexnode-5576bb6455plp   1/1     Running     3 (12h ago)      12h     10.104.14.148   4am-node18   <none>           <none>
multi-vector-corn-1-1733925600-5-milvus-indexnode-5576bb64qwdvk   1/1     Running     5 (12h ago)      12h     10.104.33.147   4am-node36   <none>           <none>
multi-vector-corn-1-1733925600-5-milvus-indexnode-5576bb64rm78n   1/1     Running     5 (12h ago)      12h     10.104.13.94    4am-node16   <none>           <none>
multi-vector-corn-1-1733925600-5-milvus-indexnode-5576bb64x72nr   1/1     Running     5 (12h ago)      12h     10.104.6.76     4am-node13   <none>           <none>
multi-vector-corn-1-1733925600-5-milvus-mixcoord-8fb68bbd54dw9c   1/1     Running     6 (12h ago)      12h     10.104.27.134   4am-node31   <none>           <none>
multi-vector-corn-1-1733925600-5-milvus-proxy-6c9c57dc97-bvzmd    1/1     Running     6 (12h ago)      12h     10.104.27.135   4am-node31   <none>           <none>
multi-vector-corn-1-1733925600-5-milvus-querynode-77bbb9ddg2ptv   1/1     Running     5 (12h ago)      12h     10.104.18.53    4am-node25   <none>           <none>
multi-vector-corn-1-1733925600-5-minio-0                          1/1     Running     0                12h     10.104.20.115   4am-node22   <none>           <none>
multi-vector-corn-1-1733925600-5-minio-1                          1/1     Running     0                12h     10.104.24.243   4am-node29   <none>           <none>
multi-vector-corn-1-1733925600-5-minio-2                          1/1     Running     0                12h     10.104.19.194   4am-node28   <none>           <none>
multi-vector-corn-1-1733925600-5-minio-3                          1/1     Running     0                12h     10.104.23.245   4am-node27   <none>           <none>
multi-vector-corn-1-1733925600-5-pulsarv3-bookie-0                1/1     Running     0                12h     10.104.20.116   4am-node22   <none>           <none>
multi-vector-corn-1-1733925600-5-pulsarv3-bookie-1                1/1     Running     0                12h     10.104.33.205   4am-node36   <none>           <none>
multi-vector-corn-1-1733925600-5-pulsarv3-bookie-2                1/1     Running     0                12h     10.104.19.195   4am-node28   <none>           <none>
multi-vector-corn-1-1733925600-5-pulsarv3-bookie-init-kfgwh       0/1     Completed   0                12h     10.104.34.151   4am-node37   <none>           <none>
multi-vector-corn-1-1733925600-5-pulsarv3-broker-0                1/1     Running     0                12h     10.104.9.203    4am-node14   <none>           <none>
multi-vector-corn-1-1733925600-5-pulsarv3-broker-1                1/1     Running     0                12h     10.104.33.149   4am-node36   <none>           <none>
multi-vector-corn-1-1733925600-5-pulsarv3-proxy-0                 1/1     Running     0                12h     10.104.19.140   4am-node28   <none>           <none>
multi-vector-corn-1-1733925600-5-pulsarv3-proxy-1                 1/1     Running     0                12h     10.104.9.208    4am-node14   <none>           <none>
multi-vector-corn-1-1733925600-5-pulsarv3-pulsar-init-9z8cz       0/1     Completed   0                12h     10.104.20.63    4am-node22   <none>           <none>
multi-vector-corn-1-1733925600-5-pulsarv3-recovery-0              1/1     Running     0                12h     10.104.9.201    4am-node14   <none>           <none>
multi-vector-corn-1-1733925600-5-pulsarv3-zookeeper-0             1/1     Running     0                12h     10.104.19.187   4am-node28   <none>           <none>
multi-vector-corn-1-1733925600-5-pulsarv3-zookeeper-1             1/1     Running     0                12h     10.104.33.202   4am-node36   <none>           <none>
multi-vector-corn-1-1733925600-5-pulsarv3-zookeeper-2             1/1     Running     0                12h     10.104.24.246   4am-node29   <none>           <none>

client log:

[2024-12-11 16:45:53,158 - ERROR - fouram]: RPC error: [drop_collection], <MilvusException: (code=10001, message=context deadline exceeded)>, <Time:{'RPC start': '2024-12-11 16:45:38.269987', 'RPC error': '2024-12-11 16:45:53.158101'}> (decorators.py:140)
[2024-12-11 16:45:53,159 - ERROR - fouram]: (api_response) : [drop_collection] <MilvusException: (code=10001, message=context deadline exceeded)>, [requestId: 59b993b2-b7df-11ef-bcdf-66ee494e38b6] (api_request.py:57)
[2024-12-11 16:45:53,159 - ERROR - fouram]: [CheckFunc] drop_collection request check failed, response:<MilvusException: (code=10001, message=context deadline exceeded)> (func_check.py:106)
[2024-12-11 16:45:53,159 - ERROR - fouram]: [func_time_catch] :  (api_request.py:127)
[2024-12-11 16:46:10,822 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-12-11 16:46:10,822 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-11 16:46:10,822 -  INFO - fouram]: grpc     hybrid_search                                                                    395     0(0.00%) |  78944     592  254805  79000 |    0.00        0.00 (stats.py:789)
[2024-12-11 16:46:10,822 -  INFO - fouram]: grpc     query                                                                            401     0(0.00%) |  67856     380  266126  58000 |    0.00        0.00 (stats.py:789)
[2024-12-11 16:46:10,822 -  INFO - fouram]: grpc     scene_test                                                                       433     1(0.23%) | 162530   63935  635327 147000 |    0.00        0.00 (stats.py:789)
[2024-12-11 16:46:10,822 -  INFO - fouram]: grpc     search                                                                           403     0(0.00%) | 112073   20009  196738 101000 |    0.00        0.00 (stats.py:789)
[2024-12-11 16:46:10,822 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-11 16:46:10,822 -  INFO - fouram]:          Aggregated                                                                      1632     1(0.06%) | 106577     380  635327  92000 |    0.00        0.00 (stats.py:789)


[2024-12-11 16:47:25,860 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: failed to flush collection 454538378847791404: etcdserver: request timed out)>, <Time:{'RPC start': '2024-12-11 16:45:38.918727', 'RPC error': '2024-12-11 16:47:25.860092'}> (decorators.py:140)
[2024-12-11 16:47:25,861 - ERROR - fouram]: (api_response) : [Collection.flush] <MilvusException: (code=65535, message=failed to call flush to data coordinator: failed to flush collection 454538378847791404: etcdserver: request timed out)>, [requestId: 5a1c8fda-b7df-11ef-bcdf-66ee494e38b6] (api_request.py:57)
[2024-12-11 16:47:25,861 - ERROR - fouram]: [CheckFunc] flush request check failed, response:<MilvusException: (code=65535, message=failed to call flush to data coordinator: failed to flush collection 454538378847791404: etcdserver: request timed out)> (func_check.py:106)
[2024-12-11 16:47:25,861 - ERROR - fouram]: [func_time_catch] :  (api_request.py:127)
[2024-12-11 16:47:30,837 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-12-11 16:47:30,837 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-11 16:47:30,837 -  INFO - fouram]: grpc     hybrid_search                                                                    400     0(0.00%) |  79380     592  254805  79000 |    0.00        0.00 (stats.py:789)
[2024-12-11 16:47:30,837 -  INFO - fouram]: grpc     query                                                                            402     0(0.00%) |  67986     380  266126  58000 |    0.00        0.00 (stats.py:789)
[2024-12-11 16:47:30,837 -  INFO - fouram]: grpc     scene_test                                                                       439     2(0.46%) | 162801   63935  635327 148000 |    0.00        0.00 (stats.py:789)
[2024-12-11 16:47:30,837 -  INFO - fouram]: grpc     search                                                                           407     0(0.00%) | 111918   20009  196738 101000 |    0.00        0.00 (stats.py:789)
[2024-12-11 16:47:30,837 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-11 16:47:30,837 -  INFO - fouram]:          Aggregated                                                                      1648     2(0.12%) | 106858     380  635327  92000 |    0.00        0.00 (stats.py:789)

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `DDL & DQL`
            verify DDL & DQL scenario,
            which has 4 vector fields(IVF_FLAT, HNSW, DISKANN, IVF_SQ8) and scalar fields: `int64_1`, `varchar_1`

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'float_vector_1': 128dim,
                'float_vector_2': 128dim,
                'float_vector_3': 128dim,
                scalar field: int64_1, varchar_1
            2. build indexes:
                IVF_FLAT: 'float_vector'
                HNSW: 'float_vector_1',
                DISKANN: 'float_vector_2'
                IVF_SQ8: 'float_vector_3'
                INVERTED: 'int64_1', 'varchar_1'
                default scalar index: 'id'
            3. insert 1 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
                replica: 1
            7. concurrent request:
                - scene_test
                    (collection: create->insert->flush->index->drop) <- drop and flush raises error
                - search
                - hybrid_search
                - query

@SimFG
Copy link
Contributor

SimFG commented Dec 12, 2024

@wangting0128 It seems that the connection to etcd is interrupted.
image

[2024-12-11 16:45:53,159 - ERROR - fouram]: [CheckFunc] drop_collection request check failed, response:<MilvusException: (code=10001, message=context deadline exceeded)> (func_check.py:106)
[2024-12-11 16:47:25,861 - ERROR - fouram]: [CheckFunc] flush request check failed, response:<MilvusException: (code=65535, message=failed to call flush to data coordinator: failed to flush collection 454538378847791404: etcdserver: request timed out)> (func_check.py:106)

@wangting0128
Copy link
Contributor Author

1. create_collection context deadline exceeded

2. flush failed to call flush to data coordinator

3. insert getSegmentID failed: SegmentIDAllocator failRemainRequest

argo task: multi-vector-corn-1-1734616800
test case name: test_hybrid_search_locust_ddl_dql_cluster
image: master-20241219-8fcb33c2-amd64

server:

NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-corn-1-1734616800-5-etcd-0                           1/1     Running     0               12h     10.104.26.27    4am-node32   <none>           <none>
multi-vector-corn-1-1734616800-5-etcd-1                           1/1     Running     0               12h     10.104.15.12    4am-node20   <none>           <none>
multi-vector-corn-1-1734616800-5-etcd-2                           1/1     Running     0               12h     10.104.34.70    4am-node37   <none>           <none>
multi-vector-corn-1-1734616800-5-milvus-datanode-bb8dc54b7qs5v9   1/1     Running     3 (12h ago)     12h     10.104.17.163   4am-node23   <none>           <none>
multi-vector-corn-1-1734616800-5-milvus-indexnode-54b7f6bc9477q   1/1     Running     3 (12h ago)     12h     10.104.30.253   4am-node38   <none>           <none>
multi-vector-corn-1-1734616800-5-milvus-indexnode-54b7f6bcdxl5j   1/1     Running     3 (12h ago)     12h     10.104.21.216   4am-node24   <none>           <none>
multi-vector-corn-1-1734616800-5-milvus-indexnode-54b7f6bcpcwnd   1/1     Running     3 (12h ago)     12h     10.104.27.218   4am-node31   <none>           <none>
multi-vector-corn-1-1734616800-5-milvus-indexnode-54b7f6bcwmcgf   1/1     Running     3 (12h ago)     12h     10.104.19.247   4am-node28   <none>           <none>
multi-vector-corn-1-1734616800-5-milvus-mixcoord-bc8fbc4f92fg8c   1/1     Running     3 (12h ago)     12h     10.104.19.248   4am-node28   <none>           <none>
multi-vector-corn-1-1734616800-5-milvus-proxy-6cc5f995c-wpzv4     1/1     Running     3 (12h ago)     12h     10.104.19.250   4am-node28   <none>           <none>
multi-vector-corn-1-1734616800-5-milvus-querynode-5d76776dqxq6h   1/1     Running     3 (12h ago)     12h     10.104.16.109   4am-node21   <none>           <none>
multi-vector-corn-1-1734616800-5-minio-0                          1/1     Running     0               12h     10.104.18.49    4am-node25   <none>           <none>
multi-vector-corn-1-1734616800-5-minio-1                          1/1     Running     0               12h     10.104.34.65    4am-node37   <none>           <none>
multi-vector-corn-1-1734616800-5-minio-2                          1/1     Running     0               12h     10.104.15.7     4am-node20   <none>           <none>
multi-vector-corn-1-1734616800-5-minio-3                          1/1     Running     0               12h     10.104.33.40    4am-node36   <none>           <none>
multi-vector-corn-1-1734616800-5-pulsarv3-bookie-0                1/1     Running     0               12h     10.104.34.64    4am-node37   <none>           <none>
multi-vector-corn-1-1734616800-5-pulsarv3-bookie-1                1/1     Running     0               12h     10.104.18.50    4am-node25   <none>           <none>
multi-vector-corn-1-1734616800-5-pulsarv3-bookie-2                1/1     Running     0               12h     10.104.33.43    4am-node36   <none>           <none>
multi-vector-corn-1-1734616800-5-pulsarv3-bookie-init-zjkn8       0/1     Completed   0               12h     10.104.34.60    4am-node37   <none>           <none>
multi-vector-corn-1-1734616800-5-pulsarv3-broker-0                1/1     Running     0               12h     10.104.26.25    4am-node32   <none>           <none>
multi-vector-corn-1-1734616800-5-pulsarv3-broker-1                1/1     Running     0               12h     10.104.6.23     4am-node13   <none>           <none>
multi-vector-corn-1-1734616800-5-pulsarv3-proxy-0                 1/1     Running     0               12h     10.104.13.100   4am-node16   <none>           <none>
multi-vector-corn-1-1734616800-5-pulsarv3-proxy-1                 1/1     Running     0               12h     10.104.14.203   4am-node18   <none>           <none>
multi-vector-corn-1-1734616800-5-pulsarv3-pulsar-init-s4ttw       0/1     Completed   0               12h     10.104.18.41    4am-node25   <none>           <none>
multi-vector-corn-1-1734616800-5-pulsarv3-recovery-0              1/1     Running     0               12h     10.104.18.40    4am-node25   <none>           <none>
multi-vector-corn-1-1734616800-5-pulsarv3-zookeeper-0             1/1     Running     0               12h     10.104.18.48    4am-node25   <none>           <none>
multi-vector-corn-1-1734616800-5-pulsarv3-zookeeper-1             1/1     Running     0               12h     10.104.20.55    4am-node22   <none>           <none>
multi-vector-corn-1-1734616800-5-pulsarv3-zookeeper-2             1/1     Running     0               12h     10.104.15.8     4am-node20   <none>           <none>

client log:

[2024-12-19 18:33:47,025 - ERROR - fouram]: RPC error: [create_collection], <MilvusException: (code=10001, message=context deadline exceeded)>, <Time:{'RPC start': '2024-12-19 18:33:35.718792', 'RPC error': '2024-12-19 18:33:47.025072'}> (decorators.py:140)
[2024-12-19 18:33:47,041 - ERROR - fouram]: (api_response) : [Collection] <MilvusException: (code=10001, message=context deadline exceeded)>, [requestId: 861fba68-be37-11ef-833b-fa03bbfc0eb0] (api_request.py:57)
[2024-12-19 18:33:47,041 - ERROR - fouram]: [CheckFunc] init_collection request check failed, response:<MilvusException: (code=10001, message=context deadline exceeded)> (func_check.py:106)
[2024-12-19 18:33:47,043 - ERROR - fouram]: [func_time_catch] :  (api_request.py:127)
[2024-12-19 18:33:47,054 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: failed to flush collection 454719441252201192: context deadline exceeded)>, <Time:{'RPC start': '2024-12-19 18:33:35.763343', 'RPC error': '2024-12-19 18:33:47.054740'}> (decorators.py:140)
[2024-12-19 18:33:47,055 - ERROR - fouram]: (api_response) : [Collection.flush] <MilvusException: (code=65535, message=failed to call flush to data coordinator: failed to flush collection 454719441252201192: context deadline exceeded)>, [requestId: c1ea9edc-be37-11ef-833b-fa03bbfc0eb0] (api_request.py:57)
[2024-12-19 18:33:47,055 - ERROR - fouram]: [CheckFunc] flush request check failed, response:<MilvusException: (code=65535, message=failed to call flush to data coordinator: failed to flush collection 454719441252201192: context deadline exceeded)> (func_check.py:106)
[2024-12-19 18:33:47,055 - ERROR - fouram]: [func_time_catch] :  (api_request.py:127)
[2024-12-19 18:33:52,217 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-12-19 18:33:52,217 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-19 18:33:52,217 -  INFO - fouram]: grpc     hybrid_search                                                                    730     0(0.00%) |  79035     109  208875  82000 |    1.10        0.00 (stats.py:789)
[2024-12-19 18:33:52,217 -  INFO - fouram]: grpc     query                                                                            710     0(0.00%) |  63264      64  307829  71000 |    1.00        0.00 (stats.py:789)
[2024-12-19 18:33:52,217 -  INFO - fouram]: grpc     scene_test                                                                       758     2(0.26%) | 170239   63619  618416 161000 |    0.00        0.00 (stats.py:789)
[2024-12-19 18:33:52,217 -  INFO - fouram]: grpc     search                                                                           740     0(0.00%) | 108240   35043  207449  89000 |    0.40        0.00 (stats.py:789)
[2024-12-19 18:33:52,217 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-19 18:33:52,217 -  INFO - fouram]:          Aggregated                                                                      2938     2(0.07%) | 106110      64  618416  87000 |    2.50        0.00 (stats.py:789)
[2024-12-19 18:33:52,217 -  INFO - fouram]:  (stats.py:790)
[2024-12-19 18:33:52,219 -  INFO - fouram]: Response time percentiles (approximated) (stats.py:819)
[2024-12-19 18:33:52,219 -  INFO - fouram]: Type     Name                                                                                  50%    66%    75%    80%    90%    95%    98%    99%  99.9% 99.99%   100% # reqs (stats.py:819)
[2024-12-19 18:33:52,219 -  INFO - fouram]: --------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------ (stats.py:819)
[2024-12-19 18:33:52,219 -  INFO - fouram]: grpc     hybrid_search                                                                       82000  88000  96000 100000 134000 166000 177000 194000 209000 209000 209000    730 (stats.py:819)
[2024-12-19 18:33:52,219 -  INFO - fouram]: grpc     query                                                                               72000  85000  89000  99000 158000 173000 209000 237000 308000 308000 308000    710 (stats.py:819)
[2024-12-19 18:33:52,219 -  INFO - fouram]: grpc     scene_test                                                                         161000 192000 228000 238000 269000 306000 356000 417000 618000 618000 618000    758 (stats.py:819)
[2024-12-19 18:33:52,219 -  INFO - fouram]: grpc     search                                                                              89000 103000 141000 156000 171000 177000 197000 204000 207000 207000 207000    740 (stats.py:819)
[2024-12-19 18:33:52,219 -  INFO - fouram]: --------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------ (stats.py:819)
[2024-12-19 18:33:52,219 -  INFO - fouram]:          Aggregated                                                                          87000 105000 153000 162000 195000 240000 289000 319000 615000 618000 618000   2938 (stats.py:819)
[2024-12-19 18:33:52,219 -  INFO - fouram]:  (stats.py:820)
[2024-12-19 18:34:12,221 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-12-19 18:34:12,221 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-19 18:34:12,221 -  INFO - fouram]: grpc     hybrid_search                                                                    730     0(0.00%) |  79035     109  208875  82000 |    1.10        0.00 (stats.py:789)
[2024-12-19 18:34:12,221 -  INFO - fouram]: grpc     query                                                                            710     0(0.00%) |  63264      64  307829  71000 |    1.00        0.00 (stats.py:789)
[2024-12-19 18:34:12,221 -  INFO - fouram]: grpc     scene_test                                                                       758     2(0.26%) | 170239   63619  618416 161000 |    0.00        0.00 (stats.py:789)
[2024-12-19 18:34:12,221 -  INFO - fouram]: grpc     search                                                                           740     0(0.00%) | 108240   35043  207449  89000 |    0.40        0.00 (stats.py:789)
[2024-12-19 18:34:12,221 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-19 18:34:12,221 -  INFO - fouram]:          Aggregated                                                                      2938     2(0.07%) | 106110      64  618416  87000 |    2.50        0.00 (stats.py:789)
[2024-12-19 18:34:12,221 -  INFO - fouram]:  (stats.py:790)
[2024-12-19 18:34:12,224 -  INFO - fouram]: Response time percentiles (approximated) (stats.py:819)
[2024-12-19 18:34:12,224 -  INFO - fouram]: Type     Name                                                                                  50%    66%    75%    80%    90%    95%    98%    99%  99.9% 99.99%   100% # reqs (stats.py:819)
[2024-12-19 18:34:12,225 -  INFO - fouram]: --------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------ (stats.py:819)
[2024-12-19 18:34:12,225 -  INFO - fouram]: grpc     hybrid_search                                                                       82000  88000  96000 100000 134000 166000 177000 194000 209000 209000 209000    730 (stats.py:819)
[2024-12-19 18:34:12,225 -  INFO - fouram]: grpc     query                                                                               72000  85000  89000  99000 158000 173000 209000 237000 308000 308000 308000    710 (stats.py:819)
[2024-12-19 18:34:12,225 -  INFO - fouram]: grpc     scene_test                                                                         161000 192000 228000 238000 269000 306000 356000 417000 618000 618000 618000    758 (stats.py:819)
[2024-12-19 18:34:12,225 -  INFO - fouram]: grpc     search                                                                              89000 103000 141000 156000 171000 177000 197000 204000 207000 207000 207000    740 (stats.py:819)
[2024-12-19 18:34:12,225 -  INFO - fouram]: --------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------ (stats.py:819)
[2024-12-19 18:34:12,225 -  INFO - fouram]:          Aggregated                                                                          87000 105000 153000 162000 195000 240000 289000 319000 615000 618000 618000   2938 (stats.py:819)
[2024-12-19 18:34:12,225 -  INFO - fouram]:  (stats.py:820)
[2024-12-19 18:34:22,059 - ERROR - fouram]: RPC error: [batch_insert], <MilvusException: (code=65535, message=getSegmentID failed: SegmentIDAllocator failRemainRequest err:failed to get collection 454719441252202244
)>, <Time:{'RPC start': '2024-12-19 18:33:59.656087', 'RPC error': '2024-12-19 18:34:22.059397'}> (decorators.py:140)
[2024-12-19 18:34:22,060 - ERROR - fouram]: (api_response) : [Collection.insert] <MilvusException: (code=65535, message=getSegmentID failed: SegmentIDAllocator failRemainRequest err:failed to get collection 454719441252202244
)>, [requestId: d0285412-be37-11ef-833b-fa03bbfc0eb0] (api_request.py:57)
[2024-12-19 18:34:22,060 - ERROR - fouram]: [CheckFunc] insert request check failed, response:<MilvusException: (code=65535, message=getSegmentID failed: SegmentIDAllocator failRemainRequest err:failed to get collection 454719441252202244
)> (func_check.py:106)
[2024-12-19 18:34:22,060 - ERROR - fouram]: [func_time_catch] :  (api_request.py:127)
[2024-12-19 18:34:32,230 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-12-19 18:34:32,231 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-19 18:34:32,231 -  INFO - fouram]: grpc     hybrid_search                                                                    730     0(0.00%) |  79035     109  208875  82000 |    0.00        0.00 (stats.py:789)
[2024-12-19 18:34:32,231 -  INFO - fouram]: grpc     query                                                                            710     0(0.00%) |  63264      64  307829  71000 |    0.00        0.00 (stats.py:789)
[2024-12-19 18:34:32,232 -  INFO - fouram]: grpc     scene_test                                                                       759     3(0.40%) | 170208   63619  618416 161000 |    0.00        0.00 (stats.py:789)
[2024-12-19 18:34:32,232 -  INFO - fouram]: grpc     search                                                                           740     0(0.00%) | 108240   35043  207449  89000 |    0.00        0.00 (stats.py:789)
[2024-12-19 18:34:32,232 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-19 18:34:32,232 -  INFO - fouram]:          Aggregated                                                                      2939     3(0.10%) | 106124      64  618416  87000 |    0.00        0.00 (stats.py:789)
[2024-12-19 18:34:32,232 -  INFO - fouram]:  (stats.py:790)
[2024-12-19 18:34:32,235 -  INFO - fouram]: Response time percentiles (approximated) (stats.py:819)
[2024-12-19 18:34:32,236 -  INFO - fouram]: Type     Name                                                                                  50%    66%    75%    80%    90%    95%    98%    99%  99.9% 99.99%   100% # reqs (stats.py:819)
[2024-12-19 18:34:32,236 -  INFO - fouram]: --------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------ (stats.py:819)
[2024-12-19 18:34:32,236 -  INFO - fouram]: grpc     hybrid_search                                                                       82000  88000  96000 100000 134000 166000 177000 194000 209000 209000 209000    730 (stats.py:819)
[2024-12-19 18:34:32,236 -  INFO - fouram]: grpc     query                                                                               72000  85000  89000  99000 158000 173000 209000 237000 308000 308000 308000    710 (stats.py:819)
[2024-12-19 18:34:32,237 -  INFO - fouram]: grpc     scene_test                                                                         161000 192000 228000 238000 269000 306000 356000 417000 618000 618000 618000    759 (stats.py:819)
[2024-12-19 18:34:32,237 -  INFO - fouram]: grpc     search                                                                              89000 103000 141000 156000 171000 177000 197000 204000 207000 207000 207000    740 (stats.py:819)
[2024-12-19 18:34:32,237 -  INFO - fouram]: --------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------ (stats.py:819)
[2024-12-19 18:34:32,237 -  INFO - fouram]:          Aggregated                                                                          87000 105000 153000 162000 195000 240000 289000 319000 615000 618000 618000   2939 (stats.py:819)
[2024-12-19 18:34:32,237 -  INFO - fouram]:  (stats.py:820)
[2024-12-19 18:34:46,140 - ERROR - fouram]: RPC error: [batch_insert], <MilvusException: (code=65535, message=getSegmentID failed: SegmentIDAllocator failRemainRequest err:failed to get collection 454719441252202247
)>, <Time:{'RPC start': '2024-12-19 18:34:29.587167', 'RPC error': '2024-12-19 18:34:46.140511'}> (decorators.py:140)
[2024-12-19 18:34:46,144 - ERROR - fouram]: (api_response) : [Collection.insert] <MilvusException: (code=65535, message=getSegmentID failed: SegmentIDAllocator failRemainRequest err:failed to get collection 454719441252202247
)>, [requestId: e1ff7792-be37-11ef-833b-fa03bbfc0eb0] (api_request.py:57)
[2024-12-19 18:34:46,144 - ERROR - fouram]: [CheckFunc] insert request check failed, response:<MilvusException: (code=65535, message=getSegmentID failed: SegmentIDAllocator failRemainRequest err:failed to get collection 454719441252202247
)> (func_check.py:106)
[2024-12-19 18:34:46,144 - ERROR - fouram]: [func_time_catch] :  (api_request.py:127)
[2024-12-19 18:34:52,241 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-12-19 18:34:52,241 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-19 18:34:52,241 -  INFO - fouram]: grpc     hybrid_search                                                                    730     0(0.00%) |  79035     109  208875  82000 |    0.00        0.00 (stats.py:789)
[2024-12-19 18:34:52,241 -  INFO - fouram]: grpc     query                                                                            710     0(0.00%) |  63264      64  307829  71000 |    0.00        0.00 (stats.py:789)
[2024-12-19 18:34:52,242 -  INFO - fouram]: grpc     scene_test                                                                       760     4(0.53%) | 170208   63619  618416 161000 |    0.00        0.00 (stats.py:789)
[2024-12-19 18:34:52,242 -  INFO - fouram]: grpc     search                                                                           740     0(0.00%) | 108240   35043  207449  89000 |    0.00        0.00 (stats.py:789)
[2024-12-19 18:34:52,242 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-19 18:34:52,242 -  INFO - fouram]:          Aggregated                                                                      2940     4(0.14%) | 106146      64  618416  87000 |    0.00        0.00 (stats.py:789)

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `DDL & DQL`
            verify DDL & DQL scenario,
            which has 4 vector fields(IVF_FLAT, HNSW, DISKANN, IVF_SQ8) and scalar fields: `int64_1`, `varchar_1`

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'float_vector_1': 128dim,
                'float_vector_2': 128dim,
                'float_vector_3': 128dim,
                scalar field: int64_1, varchar_1
            2. build indexes:
                IVF_FLAT: 'float_vector'
                HNSW: 'float_vector_1',
                DISKANN: 'float_vector_2'
                IVF_SQ8: 'float_vector_3'
                INVERTED: 'int64_1', 'varchar_1'
                default scalar index: 'id'
            3. insert 1 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
                replica: 1
            7. concurrent request:
                - scene_test
                    (collection: create->insert->flush->index->drop)
                - search
                - hybrid_search
                - query

@SimFG
Copy link
Contributor

SimFG commented Dec 20, 2024

It is currently found that it is caused by etcd timeout. I will look at the detailed reasons.
image

@wangting0128
Copy link
Contributor Author

Build index context deadline exceeded

argo task: bitmap-corn-1734663600
test case name: test_bitmap_locust_dql_ddl_cluster
image: master-20241220-a7286465-amd64

server:

NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
bitmap-corn-1734663600-4-etcd-0                                   1/1     Running     0               4h40m   10.104.18.146   4am-node25   <none>           <none>
bitmap-corn-1734663600-4-etcd-1                                   1/1     Running     0               4h40m   10.104.32.98    4am-node39   <none>           <none>
bitmap-corn-1734663600-4-etcd-2                                   1/1     Running     0               4h40m   10.104.16.198   4am-node21   <none>           <none>
bitmap-corn-1734663600-4-milvus-datanode-584f96976b-xzvvw         1/1     Running     3 (4h39m ago)   4h40m   10.104.30.132   4am-node38   <none>           <none>
bitmap-corn-1734663600-4-milvus-indexnode-57fcb757cf-9tpxs        1/1     Running     3 (4h39m ago)   4h40m   10.104.21.70    4am-node24   <none>           <none>
bitmap-corn-1734663600-4-milvus-indexnode-57fcb757cf-qvn5n        1/1     Running     3 (4h39m ago)   4h40m   10.104.6.121    4am-node13   <none>           <none>
bitmap-corn-1734663600-4-milvus-mixcoord-844c944bcd-dv7c4         1/1     Running     3 (4h39m ago)   4h40m   10.104.19.218   4am-node28   <none>           <none>
bitmap-corn-1734663600-4-milvus-proxy-84d5766db4-62fpv            1/1     Running     0               4h40m   10.104.14.244   4am-node18   <none>           <none>
bitmap-corn-1734663600-4-milvus-querynode-56984c65f5-hpdw8        1/1     Running     0               4h40m   10.104.14.245   4am-node18   <none>           <none>
bitmap-corn-1734663600-4-milvus-querynode-56984c65f5-pf7gb        1/1     Running     3 (4h39m ago)   4h40m   10.104.19.219   4am-node28   <none>           <none>
bitmap-corn-1734663600-4-minio-0                                  1/1     Running     0               4h40m   10.104.18.149   4am-node25   <none>           <none>
bitmap-corn-1734663600-4-minio-1                                  1/1     Running     0               4h40m   10.104.33.152   4am-node36   <none>           <none>
bitmap-corn-1734663600-4-minio-2                                  1/1     Running     0               4h40m   10.104.24.51    4am-node29   <none>           <none>
bitmap-corn-1734663600-4-minio-3                                  1/1     Running     0               4h40m   10.104.32.107   4am-node39   <none>           <none>
bitmap-corn-1734663600-4-pulsarv3-bookie-0                        1/1     Running     0               4h40m   10.104.33.150   4am-node36   <none>           <none>
bitmap-corn-1734663600-4-pulsarv3-bookie-1                        1/1     Running     0               4h40m   10.104.24.45    4am-node29   <none>           <none>
bitmap-corn-1734663600-4-pulsarv3-bookie-2                        1/1     Running     0               4h40m   10.104.32.102   4am-node39   <none>           <none>
bitmap-corn-1734663600-4-pulsarv3-bookie-init-fs5fr               0/1     Completed   0               4h40m   10.104.25.174   4am-node30   <none>           <none>
bitmap-corn-1734663600-4-pulsarv3-broker-0                        1/1     Running     0               4h40m   10.104.25.173   4am-node30   <none>           <none>
bitmap-corn-1734663600-4-pulsarv3-broker-1                        1/1     Running     0               4h40m   10.104.16.184   4am-node21   <none>           <none>
bitmap-corn-1734663600-4-pulsarv3-proxy-0                         1/1     Running     0               4h40m   10.104.16.185   4am-node21   <none>           <none>
bitmap-corn-1734663600-4-pulsarv3-proxy-1                         1/1     Running     0               4h40m   10.104.32.87    4am-node39   <none>           <none>
bitmap-corn-1734663600-4-pulsarv3-pulsar-init-k66hf               0/1     Completed   0               4h40m   10.104.25.175   4am-node30   <none>           <none>
bitmap-corn-1734663600-4-pulsarv3-recovery-0                      1/1     Running     0               4h40m   10.104.24.34    4am-node29   <none>           <none>
bitmap-corn-1734663600-4-pulsarv3-zookeeper-0                     1/1     Running     0               4h40m   10.104.24.44    4am-node29   <none>           <none>
bitmap-corn-1734663600-4-pulsarv3-zookeeper-1                     1/1     Running     0               4h40m   10.104.18.147   4am-node25   <none>           <none>
bitmap-corn-1734663600-4-pulsarv3-zookeeper-2                     1/1     Running     0               4h40m   10.104.16.197   4am-node21   <none>           <none>
截屏2024-12-20 16 18 12 截屏2024-12-20 16 19 49

client log:

[2024-12-20 05:05:45,593 - ERROR - fouram]: RPC error: [create_index], <MilvusException: (code=65535, message=stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/workspace/source/internal/util/grpcclient/client.go:575 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call
/workspace/source/internal/util/grpcclient/client.go:589 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/workspace/source/internal/distributed/rootcoord/client/client.go:121 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]
/workspace/source/internal/distributed/rootcoord/client/client.go:209 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).describeCollectionInternal
/workspace/source/internal/distributed/rootcoord/client/client.go:215 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).DescribeCollectionInternal
/workspace/source/internal/datacoord/broker/coordinator_broker.go:59 github.com/milvus-io/milvus/internal/datacoord/broker.(*coordinatorBroker).DescribeCollectionInternal
/workspace/source/internal/datacoord/index_service.go:168 github.com/milvus-io/milvus/internal/datacoord.(*Server).getFieldNameByID
/workspace/source/internal/datacoord/index_service.go:205 github.com/milvus-io/milvus/internal/datacoord.(*Server).CreateIndex
/workspace/source/internal/distributed/datacoord/service.go:460 github.com/milvus-io/milvus/internal/distributed/datacoord.(*Server).CreateIndex: rpc error: code = DeadlineExceeded desc = context deadline exceeded)>, <Time:{'RPC start': '2024-12-20 05:05:35.784765', 'RPC error': '2024-12-20 05:05:45.593482'}> (decorators.py:140)

[2024-12-20 05:05:45,693 - ERROR - fouram]: RPC error: [create_index], <MilvusException: (code=65535, message=stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/workspace/source/internal/util/grpcclient/client.go:575 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call
/workspace/source/internal/util/grpcclient/client.go:589 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/workspace/source/internal/distributed/rootcoord/client/client.go:121 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]
/workspace/source/internal/distributed/rootcoord/client/client.go:209 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).describeCollectionInternal
/workspace/source/internal/distributed/rootcoord/client/client.go:215 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).DescribeCollectionInternal
/workspace/source/internal/datacoord/broker/coordinator_broker.go:59 github.com/milvus-io/milvus/internal/datacoord/broker.(*coordinatorBroker).DescribeCollectionInternal
/workspace/source/internal/datacoord/index_service.go:168 github.com/milvus-io/milvus/internal/datacoord.(*Server).getFieldNameByID
/workspace/source/internal/datacoord/index_service.go:205 github.com/milvus-io/milvus/internal/datacoord.(*Server).CreateIndex
/workspace/source/internal/distributed/datacoord/service.go:460 github.com/milvus-io/milvus/internal/distributed/datacoord.(*Server).CreateIndex: rpc error: code = DeadlineExceeded desc = context deadline exceeded)>, <Time:{'RPC start': '2024-12-20 05:05:38.744567', 'RPC error': '2024-12-20 05:05:45.693431'}> (decorators.py:140)

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `primary key: INT64`
            1. building `BITMAP` index on all supported 12 scalar fields, `INVERTED` index on pk field
            2. 2 fields of different vector types
            3. verify DQL & DML requests

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim
                'float_vector_1': 768dim
                'id': primary key type is INT64

                all scalar fields: varchar max_length=100, array max_capacity=13
            2. build indexes:
                HNSW: 'float_vector'
                IVF_SQ8: 'float_vector_1'

                BITMAP: all scalar fields
                INVERTED: 'id' prmary key field
            3. insert 10 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - search
                - query
                - hybrid_search
                - scene_test
                    (collection: create->insert->flush->index->drop)
                - scene_search_test
                    (collection: create->insert->flush->index->load->search->drop)
                - scene_hybrid_search_test: 4 vector fields, 3 scalar fields
                    (collection: create->insert->flush->index->load->hybrid_search->drop)

@SimFG
Copy link
Contributor

SimFG commented Dec 23, 2024

Check the monitoring, it should be because the fsync operation of the etcd service has a large delay, and the physical machine disk is congested.
image
image
image
image
image

@yanliang567 yanliang567 modified the milestones: 2.5.0, 2.5.1, 2.5.2 Dec 24, 2024
@wangting0128
Copy link
Contributor Author

drop collection context deadline exceeded

argo task: multi-vector-corn-1-1735308000
test case name: test_hybrid_search_locust_ddl_dql_cluster
image: 2.5-20241227-ef400227-amd64

server:

NAME                                                              READY   STATUS                   RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-corn-1-1735308000-5-etcd-0                           1/1     Running                  0               12h     10.104.18.229   4am-node25   <none>           <none>
multi-vector-corn-1-1735308000-5-etcd-1                           1/1     Running                  0               12h     10.104.33.10    4am-node36   <none>           <none>
multi-vector-corn-1-1735308000-5-etcd-2                           1/1     Running                  0               12h     10.104.34.19    4am-node37   <none>           <none>
multi-vector-corn-1-1735308000-5-milvus-datanode-7cb8f477dpj5l7   1/1     Running                  5 (12h ago)     12h     10.104.6.16     4am-node13   <none>           <none>
multi-vector-corn-1-1735308000-5-milvus-indexnode-74d669846bkm5   1/1     Running                  4 (12h ago)     12h     10.104.19.135   4am-node28   <none>           <none>
multi-vector-corn-1-1735308000-5-milvus-indexnode-74d669847tzdz   1/1     Running                  4 (12h ago)     12h     10.104.15.209   4am-node20   <none>           <none>
multi-vector-corn-1-1735308000-5-milvus-indexnode-74d66984nqdtg   1/1     Running                  3 (12h ago)     12h     10.104.14.147   4am-node18   <none>           <none>
multi-vector-corn-1-1735308000-5-milvus-indexnode-74d66984qftpr   1/1     Running                  4 (12h ago)     12h     10.104.16.134   4am-node21   <none>           <none>
multi-vector-corn-1-1735308000-5-milvus-mixcoord-657c95cbdhdxqc   1/1     Running                  3 (12h ago)     12h     10.104.14.149   4am-node18   <none>           <none>
multi-vector-corn-1-1735308000-5-milvus-proxy-86f8cdc5b6-zrfvv    1/1     Running                  3 (12h ago)     12h     10.104.14.146   4am-node18   <none>           <none>
multi-vector-corn-1-1735308000-5-milvus-querynode-75656b5c52dgn   1/1     Running                  4 (12h ago)     12h     10.104.24.204   4am-node29   <none>           <none>
multi-vector-corn-1-1735308000-5-minio-0                          1/1     Running                  0               12h     10.104.18.230   4am-node25   <none>           <none>
multi-vector-corn-1-1735308000-5-minio-1                          1/1     Running                  0               12h     10.104.34.18    4am-node37   <none>           <none>
multi-vector-corn-1-1735308000-5-minio-2                          1/1     Running                  0               12h     10.104.33.11    4am-node36   <none>           <none>
multi-vector-corn-1-1735308000-5-minio-3                          1/1     Running                  0               12h     10.104.25.108   4am-node30   <none>           <none>
multi-vector-corn-1-1735308000-5-pulsarv3-bookie-0                1/1     Running                  0               12h     10.104.18.231   4am-node25   <none>           <none>
multi-vector-corn-1-1735308000-5-pulsarv3-bookie-1                1/1     Running                  0               12h     10.104.33.12    4am-node36   <none>           <none>
multi-vector-corn-1-1735308000-5-pulsarv3-bookie-2                1/1     Running                  0               12h     10.104.34.22    4am-node37   <none>           <none>
multi-vector-corn-1-1735308000-5-pulsarv3-bookie-init-9m6ll       0/1     Completed                0               12h     10.104.14.148   4am-node18   <none>           <none>
multi-vector-corn-1-1735308000-5-pulsarv3-broker-0                1/1     Running                  0               12h     10.104.13.99    4am-node16   <none>           <none>
multi-vector-corn-1-1735308000-5-pulsarv3-broker-1                1/1     Running                  0               12h     10.104.9.56     4am-node14   <none>           <none>
multi-vector-corn-1-1735308000-5-pulsarv3-proxy-0                 1/1     Running                  0               12h     10.104.9.58     4am-node14   <none>           <none>
multi-vector-corn-1-1735308000-5-pulsarv3-proxy-1                 1/1     Running                  0               12h     10.104.13.101   4am-node16   <none>           <none>
multi-vector-corn-1-1735308000-5-pulsarv3-pulsar-init-64kw6       0/1     Completed                0               12h     10.104.13.98    4am-node16   <none>           <none>
multi-vector-corn-1-1735308000-5-pulsarv3-recovery-0              1/1     Running                  0               12h     10.104.9.57     4am-node14   <none>           <none>
multi-vector-corn-1-1735308000-5-pulsarv3-zookeeper-0             1/1     Running                  0               12h     10.104.18.228   4am-node25   <none>           <none>
multi-vector-corn-1-1735308000-5-pulsarv3-zookeeper-1             1/1     Running                  0               12h     10.104.34.14    4am-node37   <none>           <none>
multi-vector-corn-1-1735308000-5-pulsarv3-zookeeper-2             1/1     Running                  0               12h     10.104.25.106   4am-node30   <none>           <none>
截屏2024-12-30 10 51 12

client log:

[2024-12-27 23:47:18,834 - ERROR - fouram]: RPC error: [drop_collection], <MilvusException: (code=10001, message=context deadline exceeded)>, <Time:{'RPC start': '2024-12-27 23:47:08.827631', 'RPC error': '2024-12-27 23:47:18.834769'}> (decorators.py:140)
[2024-12-27 23:47:18,838 - ERROR - fouram]: (api_response) : [drop_collection] <MilvusException: (code=10001, message=context deadline exceeded)>, [requestId: e2ae6b14-c4ac-11ef-9184-9a81d715d86a] (api_request.py:57)
[2024-12-27 23:47:18,838 - ERROR - fouram]: [CheckFunc] drop_collection request check failed, response:<MilvusException: (code=10001, message=context deadline exceeded)> (func_check.py:106)
[2024-12-27 23:47:18,839 - ERROR - fouram]: [func_time_catch] :  (api_request.py:127)
[2024-12-27 23:47:30,979 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-12-27 23:47:30,979 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-27 23:47:30,979 -  INFO - fouram]: grpc     hybrid_search                                                                   1687     0(0.00%) |  74713     173  260734  79000 |    0.00        0.00 (stats.py:789)
[2024-12-27 23:47:30,980 -  INFO - fouram]: grpc     query                                                                           1705     0(0.00%) |  59048     130  338955  55000 |    0.00        0.00 (stats.py:789)
[2024-12-27 23:47:30,980 -  INFO - fouram]: grpc     scene_test                                                                      1624     1(0.06%) | 170288   63552  710412 153000 |    0.10        0.10 (stats.py:789)
[2024-12-27 23:47:30,980 -  INFO - fouram]: grpc     search                                                                          1673     0(0.00%) | 104879   17793  281711  88000 |    0.00        0.00 (stats.py:789)
[2024-12-27 23:47:30,980 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-12-27 23:47:30,980 -  INFO - fouram]:          Aggregated                                                                      6689     1(0.01%) | 101469     130  710412  86000 |    0.10        0.10 (stats.py:789)

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `DDL & DQL`
            verify DDL & DQL scenario,
            which has 4 vector fields(IVF_FLAT, HNSW, DISKANN, IVF_SQ8) and scalar fields: `int64_1`, `varchar_1`

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'float_vector_1': 128dim,
                'float_vector_2': 128dim,
                'float_vector_3': 128dim,
                scalar field: int64_1, varchar_1
            2. build indexes:
                IVF_FLAT: 'float_vector'
                HNSW: 'float_vector_1',
                DISKANN: 'float_vector_2'
                IVF_SQ8: 'float_vector_3'
                INVERTED: 'int64_1', 'varchar_1'
                default scalar index: 'id'
            3. insert 1 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
                replica: 1
            7. concurrent request:
                - scene_test
                    (collection: create->insert->flush->index->drop)
                - search
                - hybrid_search
                - query

@yanliang567 yanliang567 modified the milestones: 2.5.2, 2.5.3 Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants