Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [benchmark][pulsarv3] insert 50000 128dim data timeout #37929

Closed
1 task done
wangting0128 opened this issue Nov 22, 2024 · 2 comments
Closed
1 task done

[Bug]: [benchmark][pulsarv3] insert 50000 128dim data timeout #37929

wangting0128 opened this issue Nov 22, 2024 · 2 comments
Assignees
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@wangting0128
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-20241122-06d73cf2-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc124
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: fouramf-9dhlb-wt-3
test case name: test_ivf_flat_search_filter_cluster

server:

NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
fouramf-9dhlb-wt-3-95-2001-etcd-0                                 1/1     Running     0               3m52s   10.104.24.19    4am-node29   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-etcd-1                                 1/1     Running     0               3m52s   10.104.20.82    4am-node22   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-etcd-2                                 1/1     Running     0               3m52s   10.104.19.230   4am-node28   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-milvus-datanode-b9758796-24pql         1/1     Running     2 (3m19s ago)   3m52s   10.104.24.7     4am-node29   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-milvus-indexnode-697d7b67dd-p2796      1/1     Running     1 (3m40s ago)   3m52s   10.104.14.125   4am-node18   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-milvus-mixcoord-68c56fd49f-w5bv8       1/1     Running     2 (3m30s ago)   3m52s   10.104.20.73    4am-node22   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-milvus-proxy-6dd7764997-h6kxd          1/1     Running     2 (3m29s ago)   3m52s   10.104.5.155    4am-node12   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-milvus-querynode-7fcd84758b-bn26z      1/1     Running     2 (3m30s ago)   3m52s   10.104.16.22    4am-node21   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-minio-0                                1/1     Running     0               3m52s   10.104.24.17    4am-node29   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-minio-1                                1/1     Running     0               3m52s   10.104.20.79    4am-node22   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-minio-2                                1/1     Running     0               3m51s   10.104.19.231   4am-node28   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-minio-3                                1/1     Running     0               3m51s   10.104.34.43    4am-node37   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-bookie-0                      1/1     Running     0               3m52s   10.104.24.18    4am-node29   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-bookie-1                      1/1     Running     0               3m52s   10.104.20.81    4am-node22   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-bookie-2                      1/1     Running     0               3m51s   10.104.19.232   4am-node28   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-bookie-init-cwz8s             0/1     Completed   0               3m52s   10.104.5.154    4am-node12   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-broker-0                      1/1     Running     0               3m52s   10.104.6.224    4am-node13   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-broker-1                      1/1     Running     0               3m52s   10.104.5.157    4am-node12   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-proxy-0                       1/1     Running     0               3m51s   10.104.5.159    4am-node12   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-proxy-1                       1/1     Running     0               3m51s   10.104.6.225    4am-node13   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-pulsar-init-xlg2t             0/1     Completed   0               3m52s   10.104.24.6     4am-node29   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-recovery-0                    1/1     Running     1 (3m ago)      3m52s   10.104.5.156    4am-node12   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-zookeeper-0                   1/1     Running     0               3m52s   10.104.24.16    4am-node29   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-zookeeper-1                   1/1     Running     0               3m52s   10.104.20.80    4am-node22   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-zookeeper-2                   1/1     Running     0               3m52s   10.104.19.229   4am-node28   <none>           <none>
截屏2024-11-22 14 18 16

client log:

[2024-11-22 06:10:49,618 -  INFO - fouram]: [Base] Connection params: {'alias': 'default', 'host': 'fouramf-9dhlb-wt-3-95-2001-milvus.qa-milvus.svc.cluster.local', 'port': '19530', 'uri': '', 'secure': False, 'user': '', 'password': '', 'token': '', 'db_name': ''} (base.py:240)
[2024-11-22 06:10:49,637 -  INFO - fouram]: [Base] Start clean all collections [] (base.py:289)
[2024-11-22 06:10:49,639 -  INFO - fouram]: [Base] Create collection fouram_gFuUAM3g (base.py:273)
[2024-11-22 06:10:49,759 -  INFO - fouram]: [Base] Collection schema: 
{'auto_id': False,
 'description': '',
 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}},
            {'name': 'int64_1', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'int64_2', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'float_1', 'description': '', 'type': <DataType.FLOAT: 10>},
            {'name': 'double_1', 'description': '', 'type': <DataType.DOUBLE: 11>}, {'name': 'varchar_1', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 256}}],
 'enable_dynamic_field': False} (base.py:329)
[2024-11-22 06:10:49,759 -  INFO - fouram]: [CommonCases] Prepare collection fouram_gFuUAM3g done. (common_cases.py:77)
[2024-11-22 06:10:49,764 -  INFO - fouram]: [Base] Collection:fouram_gFuUAM3g is not building index (base.py:491)
[2024-11-22 06:10:49,764 -  INFO - fouram]: [Base] Start release collection fouram_gFuUAM3g (base.py:324)
[2024-11-22 06:10:49,784 -  INFO - fouram]: [Base] Clean all index done. (base.py:515)
[2024-11-22 06:10:49,785 -  INFO - fouram]: [Base] Start build index of IVF_FLAT for field:float_vector collection:fouram_gFuUAM3g, params:{'index_type': 'IVF_FLAT', 'metric_type': 'L2', 'params': {'nlist': 2048}}, kwargs:{} (base.py:469)
[2024-11-22 06:10:50,313 -  INFO - fouram]: [Time] Index run in 0.5285s (api_request.py:49)
[2024-11-22 06:10:50,313 -  INFO - fouram]: [CommonCases] RT of build index IVF_FLAT: 0.5285s (common_cases.py:162)
[2024-11-22 06:10:50,313 -  INFO - fouram]: [CommonCases] Prepare index IVF_FLAT done. (common_cases.py:164)
[2024-11-22 06:10:50,314 -  INFO - fouram]: [CommonCases] No scalar and vector fields need to be indexed. (common_cases.py:183)
[2024-11-22 06:10:50,315 -  INFO - fouram]: [Base] Index params of fouram_gFuUAM3g:[{'float_vector': {'index_type': 'IVF_FLAT', 'metric_type': 'L2', 'params': {'nlist': 2048}}}] (base.py:488)
[2024-11-22 06:10:50,316 -  INFO - fouram]: [Base] Start inserting 50000000 vectors to collection fouram_gFuUAM3g (base.py:383)
[2024-11-22 06:10:50,390 -  INFO - fouram]: [Base] Start inserting, ids: 0 - 49999, data size: 50,000,000 (base.py:363)
[2024-11-22 06:11:22,060 - ERROR - fouram]: RPC error: [batch_insert], <MilvusException: (code=65535, message=message send timeout: TimeoutError)>, <Time:{'RPC start': '2024-11-22 06:10:50.790432', 'RPC error': '2024-11-22 06:11:22.060552'}> (decorators.py:140)
[2024-11-22 06:11:22,062 - ERROR - fouram]: (api_response) : [Collection.insert] <MilvusException: (code=65535, message=message send timeout: TimeoutError)>, [requestId: 85bb41d2-a898-11ef-b68d-7a0719bd6d08] (api_request.py:57)
[2024-11-22 06:11:22,062 - ERROR - fouram]: [CheckFunc] insert request check failed, response:<MilvusException: (code=65535, message=message send timeout: TimeoutError)> (func_check.py:106)

Expected Behavior

No response

Steps To Reproduce

1. create a collection with fields: 'id'(primary key), 'float_vector'(128dim), "int64_1","int64_2","float_1","double_1","varchar_1"
2. build index IVF_FLAT on 'float_vector'
3. insert 50000 data <- timeout

Milvus Log

No response

Anything else?

server config:

{
     "queryNode": {
          "resources": {
               "limits": {
                    "cpu": "16.0",
                    "memory": "64Gi"
               },
               "requests": {
                    "cpu": "9.0",
                    "memory": "33Gi"
               }
          }
     },
     "indexNode": {
          "resources": {
               "limits": {
                    "cpu": "16.0",
                    "memory": "20Gi"
               },
               "requests": {
                    "cpu": "9.0",
                    "memory": "11Gi"
               }
          },
          "replicas": 1
     },
     "dataNode": {
          "resources": {
               "limits": {
                    "cpu": "2.0",
                    "memory": "4Gi"
               },
               "requests": {
                    "cpu": "2.0",
                    "memory": "3Gi"
               }
          },
          "replicas": 1
     },
     "cluster": {
          "enabled": true
     },
     "pulsar": {
          "enabled": false
     },
     "kafka": {},
     "minio": {
          "metrics": {
               "podMonitor": {
                    "enabled": true
               }
          }
     },
     "etcd": {
          "metrics": {
               "enabled": true,
               "podMonitor": {
                    "enabled": true
               }
          },
          "image": {
               "tag": "3.5.16-r1"
          }
     },
     "metrics": {
          "serviceMonitor": {
               "enabled": true
          }
     },
     "log": {
          "level": "debug"
     },
     "pulsarv3": {
          "enabled": true,
          "broker": {
               "podMonitor": {
                    "enabled": true
               }
          },
          "bookkeeper": {
               "podMonitor": {
                    "enabled": true
               }
          }
     },
     "image": {
          "all": {
               "repository": "harbor.milvus.io/milvus/milvus",
               "tag": "master-20241122-06d73cf2-amd64"
          }
     }
}

client config:

{
     "dataset_params": {
          "metric_type": "L2",
          "dim": 128,
          "dataset_name": "sift",
          "dataset_size": 50000000,
          "ni_per": 50000,
          "req_run_counts": 10
     },
     "collection_params": {
          "other_fields": [
               "int64_1",
               "int64_2",
               "float_1",
               "double_1",
               "varchar_1"
          ],
          "shards_num": 2
     },
     "search_params": {
          "expr": [
               {
                    "float_1": {
                         "GT": -1,
                         "LT": 5000000
                    }
               },
               {
                    "float_1": {
                         "GT": -1,
                         "LT": 25000000
                    }
               },
               {
                    "float_1": {
                         "GT": -1,
                         "LT": 45000000
                    }
               }
          ],
          "top_k": [
               1,
               10,
               100,
               1000
          ],
          "nq": [
               1,
               10,
               100,
               200,
               500,
               1000,
               1200
          ],
          "search_param": {
               "nprobe": [
                    8,
                    32
               ]
          }
     },
     "index_params": {
          "index_type": "IVF_FLAT",
          "index_param": {
               "nlist": 2048
          }
     }
}
@wangting0128 wangting0128 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. test/benchmark benchmark test labels Nov 22, 2024
@wangting0128 wangting0128 added this to the 2.5.0 milestone Nov 22, 2024
@LoveEachDay
Copy link
Contributor

@wangting0128 The newly deployed pulsar cluster failed to change a configuration of nettyMaxFrameSizeBytes, which trigger a error when message size is too large:

2024-11-22T06:27:58,292+0000 [bookie-io-8-87] ERROR org.apache.bookkeeper.proto.BookieRequestHandler - Unhandled exception occurred in I/O thread or handler on [id: 0xd23eddeb, L:/10.104.20.81:3181 - R:/10.104.5.157:49740]
io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 5253120: 5346562 - discarded
	at io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:507) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:493) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.handler.codec.LengthFieldBasedFrameDecoder.exceededFrameLength(LengthFieldBasedFrameDecoder.java:377) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:423) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:333) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.handler.flush.FlushConsolidationHandler.channelRead(FlushConsolidationHandler.java:152) ~[io.netty-netty-handler-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]

We'll change the default nettyMaxFrameSizeBytes in the next release of milvus-helm chart.

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 22, 2024
@yanliang567 yanliang567 removed their assignment Nov 22, 2024
@wangting0128
Copy link
Contributor Author

verification passed

argo task: fouramf-n59lz
test case name: test_ivf_flat_search_filter_cluster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants