[Bug]: [benchmark] diskann index inserts 100 million data, querynode disk usage peaks at over 100G #25163

elstic · 2023-06-27T02:37:29Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version:2.2.0-20230626-eac54cbb
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus==2.4.0.dev36
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task : fouramf-concurrent-n5lrq, id : 2
case: test_concurrent_locust_100m_diskann_ddl_dql_filter_cluster
This is a frequently run test case, which was available in previous versions by testing.

server:

fouram-45-7069-etcd-0                                             1/1     Running       0               6m4s    10.104.4.106    4am-node11   <none>           <none>
fouram-45-7069-etcd-1                                             1/1     Running       0               6m4s    10.104.20.203   4am-node22   <none>           <none>
fouram-45-7069-etcd-2                                             1/1     Running       0               6m3s    10.104.15.13    4am-node20   <none>           <none>
fouram-45-7069-milvus-datacoord-7685b67fc-pl6r5                   1/1     Running       1 (2m3s ago)    6m4s    10.104.4.87     4am-node11   <none>           <none>
fouram-45-7069-milvus-datanode-f87b86d88-n4xwz                    1/1     Running       1 (2m4s ago)    6m4s    10.104.21.17    4am-node24   <none>           <none>
fouram-45-7069-milvus-indexcoord-79b9795579-jzl68                 1/1     Running       1 (2m3s ago)    6m4s    10.104.9.239    4am-node14   <none>           <none>
fouram-45-7069-milvus-indexnode-86c4d777c4-q4brg                  1/1     Running       0               6m4s    10.104.9.238    4am-node14   <none>           <none>
fouram-45-7069-milvus-proxy-78d5df4cdc-27znx                      1/1     Running       1 (2m3s ago)    6m4s    10.104.4.88     4am-node11   <none>           <none>
fouram-45-7069-milvus-querycoord-7cb6c4ddb8-wstvc                 1/1     Running       1 (2m3s ago)    6m4s    10.104.4.89     4am-node11   <none>           <none>
fouram-45-7069-milvus-querynode-867596d85b-hk6rz                  1/1     Running       0               6m4s    10.104.6.50     4am-node13   <none>           <none>
fouram-45-7069-milvus-rootcoord-d7c486488-kqwhq                   1/1     Running       1 (2m3s ago)    6m4s    10.104.9.240    4am-node14   <none>           <none>
fouram-45-7069-minio-0                                            1/1     Running       0               6m4s    10.104.6.54     4am-node13   <none>           <none>
fouram-45-7069-minio-1                                            1/1     Running       0               6m4s    10.104.4.104    4am-node11   <none>           <none>
fouram-45-7069-minio-2                                            1/1     Running       0               6m4s    10.104.16.227   4am-node21   <none>           <none>
fouram-45-7069-minio-3                                            1/1     Running       0               6m3s    10.104.20.205   4am-node22   <none>           <none>
fouram-45-7069-pulsar-bookie-0                                    1/1     Running       0               6m4s    10.104.4.103    4am-node11   <none>           <none>
fouram-45-7069-pulsar-bookie-1                                    1/1     Running       0               6m4s    10.104.15.11    4am-node20   <none>           <none>
fouram-45-7069-pulsar-bookie-2                                    1/1     Running       0               6m4s    10.104.16.230   4am-node21   <none>           <none>
fouram-45-7069-pulsar-bookie-init-shm2z                           0/1     Completed     0               6m4s    10.104.15.5     4am-node20   <none>           <none>
fouram-45-7069-pulsar-broker-0                                    1/1     Running       0               6m4s    10.104.15.6     4am-node20   <none>           <none>
fouram-45-7069-pulsar-proxy-0                                     1/1     Running       0               6m4s    10.104.16.225   4am-node21   <none>           <none>
fouram-45-7069-pulsar-pulsar-init-8jk8d                           0/1     Completed     0               6m4s    10.104.15.254   4am-node20   <none>           <none>
fouram-45-7069-pulsar-recovery-0                                  1/1     Running       0               6m4s    10.104.4.90     4am-node11   <none>           <none>
fouram-45-7069-pulsar-zookeeper-0                                 1/1     Running       0               6m4s    10.104.21.19    4am-node24   <none>           <none>
fouram-45-7069-pulsar-zookeeper-1                                 1/1     Running       0               4m57s   10.104.6.57     4am-node13   <none>           <none>
fouram-45-7069-pulsar-zookeeper-2                                 1/1     Running       0               4m20s   10.104.5.94     4am-node12   <none>           <none>

client log:

[2023-06-26 12:02:38,308 -  INFO - fouram]: [Base] Number of vectors in the collection(fouram_0dreLHRo): 99900000 (base.py:468)
[2023-06-26 12:02:38,459 -  INFO - fouram]: [Base] Start inserting, ids: 99950000 - 99999999, data size: 100,000,000 (base.py:308)
[2023-06-26 12:02:40,008 -  INFO - fouram]: [Time] Collection.insert run in 1.5493s (api_request.py:45)
[2023-06-26 12:02:40,011 -  INFO - fouram]: [Base] Number of vectors in the collection(fouram_0dreLHRo): 99900000 (base.py:468)
[2023-06-26 12:02:40,062 -  INFO - fouram]: [Base] Total time of insert: 3187.9628s, average number of vector bars inserted per second: 31367.9946, average time to insert 50000 vectors per time: 1.594s (base.py:379)
[2023-06-26 12:02:40,062 -  INFO - fouram]: [Base] Start flush collection fouram_0dreLHRo (base.py:277)
[2023-06-26 12:02:43,125 -  INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:441)
[2023-06-26 12:02:43,125 -  INFO - fouram]: [Base] Start release collection fouram_0dreLHRo (base.py:288)
[2023-06-26 12:02:43,127 -  INFO - fouram]: [Base] Start build index of DISKANN for collection fouram_0dreLHRo, params:{'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}} (base.py:427)
[2023-06-26 17:34:27,390 -  INFO - fouram]: [Time] Index run in 19904.2613s (api_request.py:45)
[2023-06-26 17:34:27,391 -  INFO - fouram]: [CommonCases] RT of build index DISKANN: 19904.2613s (common_cases.py:96)
[2023-06-26 17:34:27,416 -  INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:441)
[2023-06-26 17:34:27,416 -  INFO - fouram]: [CommonCases] Prepare index DISKANN done. (common_cases.py:99)
[2023-06-26 17:34:27,416 -  INFO - fouram]: [CommonCases] No scalars need to be indexed. (common_cases.py:107)
[2023-06-26 17:34:27,418 -  INFO - fouram]: [Base] Number of vectors in the collection(fouram_0dreLHRo): 100000000 (base.py:468)
[2023-06-26 17:34:27,418 -  INFO - fouram]: [Base] Start load collection fouram_0dreLHRo,replica_number:1,kwargs:{} (base.py:283)
[2023-06-26 18:51:04,491 - ERROR - fouram]: RPC error: [get_loading_progress], <MilvusException: (code=1, message=failed to load segment: follower 12 failed to load segment, reason load segment failed, disk space is not enough, collectionID = 442440457093644855, usedDiskAfterLoad = 100294 MB, totalDisk = 102400 MB, thresholdFactor = 0.950000)>, <Time:{'RPC start': '2023-06-26 18:51:04.489527', 'RPC error': '2023-06-26 18:51:04.491093'}> (decorators.py:108)
[2023-06-26 18:51:04,493 - ERROR - fouram]: RPC error: [wait_for_loading_collection], <MilvusException: (code=1, message=failed to load segment: follower 12 failed to load segment, reason load segment failed, disk space is not enough, collectionID = 442440457093644855, usedDiskAfterLoad = 100294 MB, totalDisk = 102400 MB, thresholdFactor = 0.950000)>, <Time:{'RPC start': '2023-06-26 17:34:27.474903', 'RPC error': '2023-06-26 18:51:04.493072'}> (decorators.py:108)
[2023-06-26 18:51:04,493 - ERROR - fouram]: RPC error: [load_collection], <MilvusException: (code=1, message=failed to load segment: follower 12 failed to load segment, reason load segment failed, disk space is not enough, collectionID = 442440457093644855, usedDiskAfterLoad = 100294 MB, totalDisk = 102400 MB, thresholdFactor = 0.950000)>, <Time:{'RPC start': '2023-06-26 17:34:27.418905', 'RPC error': '2023-06-26 18:51:04.493252'}> (decorators.py:108)
[2023-06-26 18:51:04,494 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=failed to load segment: follower 12 failed to load segment, reason load segment failed, disk space is not enough, collectionID = 442440457093644855, usedDiskAfterLoad = 100294 MB, totalDisk = 102400 MB, thresholdFactor = 0.950000)> (api_request.py:53)
[2023-06-26 18:51:04,495 - ERROR - fouram]: [CheckFunc] load request check failed, response:<MilvusException: (code=1, message=failed to load segment: follower 12 failed to load segment, reason load segment failed, disk space is not enough, collectionID = 442440457093644855, usedDiskAfterLoad = 100294 MB, totalDisk = 102400 MB, thresholdFactor = 0.950000)> (func_check.py:52)
FAILED[

client pod : fouramf-concurrent-n5lrq-1120963268

Expected Behavior

load success.

Steps To Reproduce

1. create a collection or use an existing collection
        2. build index on vector column  => diskann
        3. insert a certain number of vectors   => 100m
        4. flush collection
        5. build index on vector column with the same parameters
        6. build index on on scalars column or not
        7. count the total number of rows
        8. load collection  ==> failed
       # 9. perform concurrent operations
       # 10. clean all collections or not

Milvus Log

No response

Anything else?

No response

The text was updated successfully, but these errors were encountered:

yanliang567 · 2023-06-27T02:40:43Z

/assign @xige-16
/unassign

xiaofan-luan · 2023-06-27T02:41:22Z

disk space is not enough

As the error said: disk space is not enough

xiaofan-luan · 2023-06-27T02:41:34Z

@elstic
probably need to check the disk space?

elstic · 2023-06-27T02:56:24Z

@elstic probably need to check the disk space?

@xiaofan-luan

We have parameters that limit the disk in this way, but previously the case was passable and we did not change the case parameters or other configuration information.

 --set queryNode.resources.limits.cpu=8,queryNode.resources.limits.memory=32Gi,queryNode.resources.limits.ephemeral-storage=100Gi

So I assume that the current image needs to use more disk space

xiaofan-luan · 2023-06-27T04:13:03Z

@xige-16
pls take a glance at it

elstic · 2023-06-30T02:07:25Z

Comparison between the most recent version and the previous version, after inserting 100m data, diskann index load when occupying disk comparison:

image: 2.2.0-20230627-936ebf32 , Peak disk usage is about 104G, after stabilization about 56.3G

image: v2.2.7 , The v2.2.7 version has a peak disk usage of around 65G. Stabilised at around 56G

In the same case, the peak disk usage in v2.2.7 is around 65G, while the current version occasionally exceeds 100G.Peak disk usage is approximately 40G more

@xige-16

xige-16 · 2023-07-03T07:43:07Z

The load process of milvus has remained unchanged, you can test whether it is caused by the upgrade of knowhere

xige-16 · 2023-07-03T08:30:03Z

Comparison between the most recent version and the previous version, after inserting 100m data, diskann index load when occupying disk comparison:

image: 2.2.0-20230627-936ebf32 , Peak disk usage is about 104G, after stabilization about 56.3G

image: v2.2.7 , The v2.2.7 version has a peak disk usage of around 65G. Stabilised at around 56G

In the same case, the peak disk usage in v2.2.7 is around 65G, while the current version occasionally exceeds 100G.Peak disk usage is approximately 40G more

@xige-16

This picture shows that the disk space of minio is not enough, the minio pod of the new version uses more disk space than the old version

LoveEachDay · 2023-07-03T11:51:07Z

Querynodes are evicted due to the disk usage exceeding 100GB, which was set by queryNode.resources.limits.ephemeral-storage=100Gi.

xiaofan-luan · 2023-07-03T12:18:14Z

might be related to compaction issue?

xige-16 · 2023-07-04T03:24:07Z

might be related to compaction issue?

There are two phenomena in this issue. The disk usage during the minio and querynode load processes has increased, but the disk usage in the final state has not changed, indicating that the size of the index has not changed. The high probability is that the old segments have not been cleaned up in time after compaction Cause, I will check the log to confirm

elstic · 2023-07-19T02:55:58Z

This issue still exists .

Disk usage of querynode until after stable search:

server :
(The querynode was evicted several times before it could be searched properly.)

fouramf-p559l-33-1518-etcd-0                                      1/1     Running                  0                 14h     10.104.4.203    4am-node11   <none>           <none>
fouramf-p559l-33-1518-etcd-1                                      1/1     Running                  0                 14h     10.104.13.165   4am-node16   <none>           <none>
fouramf-p559l-33-1518-etcd-2                                      1/1     Running                  0                 14h     10.104.9.234    4am-node14   <none>           <none>
fouramf-p559l-33-1518-milvus-datacoord-6b95b4f45f-bfcsg           1/1     Running                  0                 14h     10.104.17.168   4am-node23   <none>           <none>
fouramf-p559l-33-1518-milvus-datanode-64fdd568d4-2x2km            1/1     Running                  0                 14h     10.104.9.228    4am-node14   <none>           <none>
fouramf-p559l-33-1518-milvus-indexcoord-66df6d745-b9nbk           1/1     Running                  0                 14h     10.104.13.162   4am-node16   <none>           <none>
fouramf-p559l-33-1518-milvus-indexnode-65755cc48-l8tks            1/1     Running                  0                 14h     10.104.12.233   4am-node17   <none>           <none>
fouramf-p559l-33-1518-milvus-proxy-5f45b67f5c-j5znr               1/1     Running                  0                 14h     10.104.9.227    4am-node14   <none>           <none>
fouramf-p559l-33-1518-milvus-querycoord-b495678cd-dfghm           1/1     Running                  0                 14h     10.104.12.232   4am-node17   <none>           <none>
fouramf-p559l-33-1518-milvus-querynode-7cfc45cc74-gtbgj           0/1     Error                    0                 7h4m    10.104.17.89    4am-node23   <none>           <none>
fouramf-p559l-33-1518-milvus-querynode-7cfc45cc74-jcp5c           0/1     Error                    0                 6h57m   10.104.15.74    4am-node20   <none>           <none>
fouramf-p559l-33-1518-milvus-querynode-7cfc45cc74-l9c8g           0/1     Error                    0                 7h10m   10.104.17.88    4am-node23   <none>           <none>
fouramf-p559l-33-1518-milvus-querynode-7cfc45cc74-pfczx           0/1     ContainerStatusUnknown   1                 7h17m   10.104.17.86    4am-node23   <none>           <none>
fouramf-p559l-33-1518-milvus-querynode-7cfc45cc74-qbkcj           0/1     Error                    0                 6h40m   10.104.15.105   4am-node20   <none>           <none>
fouramf-p559l-33-1518-milvus-querynode-7cfc45cc74-r8kl2           0/1     Error                    0                 6h46m   10.104.15.83    4am-node20   <none>           <none>
fouramf-p559l-33-1518-milvus-querynode-7cfc45cc74-wpxpw           1/1     Running                  0                 6h34m   10.104.15.108   4am-node20   <none>           <none>
fouramf-p559l-33-1518-milvus-querynode-7cfc45cc74-z2sd8           0/1     Error                    0                 14h     10.104.17.169   4am-node23   <none>           <none>
fouramf-p559l-33-1518-milvus-querynode-7cfc45cc74-zbnlz           0/1     Error                    0                 6h52m   10.104.15.78    4am-node20   <none>           <none>
fouramf-p559l-33-1518-milvus-rootcoord-7cf6c89488-pqdg2           1/1     Running                  0                 14h     10.104.12.230   4am-node17   <none>           <none>
fouramf-p559l-33-1518-minio-0                                     1/1     Running                  0                 14h     10.104.12.237   4am-node17   <none>           <none>
fouramf-p559l-33-1518-minio-1                                     1/1     Running                  0                 14h     10.104.21.40    4am-node24   <none>           <none>
fouramf-p559l-33-1518-minio-2                                     1/1     Running                  0                 14h     10.104.4.204    4am-node11   <none>           <none>
fouramf-p559l-33-1518-minio-3                                     1/1     Running                  0                 14h     10.104.9.232    4am-node14   <none>           <none>
fouramf-p559l-33-1518-pulsar-bookie-0                             1/1     Running                  0                 14h     10.104.12.238   4am-node17   <none>           <none>
fouramf-p559l-33-1518-pulsar-bookie-1                             1/1     Running                  0                 14h     10.104.21.43    4am-node24   <none>           <none>
fouramf-p559l-33-1518-pulsar-bookie-2                             1/1     Running                  0                 14h     10.104.4.207    4am-node11   <none>           <none>
fouramf-p559l-33-1518-pulsar-bookie-init-7qz9r                    0/1     Completed                0                 14h     10.104.21.38    4am-node24   <none>           <none>
fouramf-p559l-33-1518-pulsar-broker-0                             1/1     Running                  0                 14h     10.104.13.163   4am-node16   <none>           <none>
fouramf-p559l-33-1518-pulsar-proxy-0                              1/1     Running                  0                 14h     10.104.4.200    4am-node11   <none>           <none>
fouramf-p559l-33-1518-pulsar-pulsar-init-qnm5x                    0/1     Completed                0                 14h     10.104.21.37    4am-node24   <none>           <none>
fouramf-p559l-33-1518-pulsar-recovery-0                           1/1     Running                  0                 14h     10.104.12.231   4am-node17   <none>           <none>
fouramf-p559l-33-1518-pulsar-zookeeper-0                          1/1     Running                  0                 14h     10.104.9.230    4am-node14   <none>           <none>
fouramf-p559l-33-1518-pulsar-zookeeper-1                          1/1     Running                  0                 14h     10.104.4.209    4am-node11   <none>           <none>
fouramf-p559l-33-1518-pulsar-zookeeper-2                          1/1     Running                  0                 14h     10.104.23.193   4am-node27   <none>           <none>

minio disk monitoring：

elstic · 2023-07-26T04:00:47Z

The problem of querynode disk usage exceeding 100g is also found on the master branch, the same case on the master branch, the querynode disk usage is 108g.

xige-16 · 2023-08-07T06:36:06Z

@elstic Please check if this PR has any effect #25899

elstic · 2023-08-08T07:03:47Z

@elstic Please check if this PR has any effect #25899

This issue still exists.
Validated image: 2.2.0-20230807-ef31fe23

smellthemoon · 2023-08-08T07:54:38Z

@elstic Please check if this PR has any effect #25899

#25896 may be this one?

elstic · 2023-08-16T02:18:06Z

Recent image: '2.2.0-20230814-27fe2a45' inserted 100 million data load successfully

stale · 2023-09-19T23:06:50Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

elstic · 2023-09-20T02:12:22Z

this problem has not occurred recently

elstic · 2023-10-26T04:05:11Z

diskann index inserts 100 million data, querynode disk usage peaks at over 100g
case: test_concurrent_locust_100m_diskann_ddl_dql_filter_cluster

image: master-20231023-0c33ddb7
querynode disk setup：（100G）

{'queryNode': {'resources': {'limits': {'cpu': '8', 'memory': '32Gi', 'ephemeral-storage': '100Gi'}

server:

fouramf-hqsbb-36-5136-etcd-0                                      1/1     Running       0               4m21s   10.104.18.122   4am-node25   <none>           <none>
fouramf-hqsbb-36-5136-etcd-1                                      1/1     Running       0               4m21s   10.104.23.163   4am-node27   <none>           <none>
fouramf-hqsbb-36-5136-etcd-2                                      1/1     Running       0               4m21s   10.104.16.158   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-milvus-datacoord-54c74794c9-6b5xv           1/1     Running       0               4m21s   10.104.16.149   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-milvus-datanode-5f46b6dd95-jlqbj            1/1     Running       0               4m21s   10.104.16.151   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-milvus-indexcoord-55b6fd5f88-9wj5q          1/1     Running       0               4m21s   10.104.16.150   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-milvus-indexnode-5ddd6dd445-pdwtt           1/1     Running       0               4m21s   10.104.24.67    4am-node29   <none>           <none>
fouramf-hqsbb-36-5136-milvus-proxy-5d456bd744-v8p6f               1/1     Running       0               4m21s   10.104.16.152   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-milvus-querycoord-75d48fc58f-grvtt          1/1     Running       0               4m21s   10.104.16.146   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-milvus-querynode-5ddc775ff-wgtwc            1/1     Running       0               4m21s   10.104.19.139   4am-node28   <none>           <none>
fouramf-hqsbb-36-5136-milvus-rootcoord-846bd64d7-4dnvt            1/1     Running       0               4m21s   10.104.16.148   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-minio-0                                     1/1     Running       0               4m21s   10.104.16.155   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-minio-1                                     1/1     Running       0               4m21s   10.104.18.119   4am-node25   <none>           <none>
fouramf-hqsbb-36-5136-minio-2                                     1/1     Running       0               4m21s   10.104.23.159   4am-node27   <none>           <none>
fouramf-hqsbb-36-5136-minio-3                                     1/1     Running       0               4m21s   10.104.15.212   4am-node20   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-bookie-0                             1/1     Running       0               4m21s   10.104.18.123   4am-node25   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-bookie-1                             1/1     Running       0               4m21s   10.104.23.169   4am-node27   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-bookie-2                             1/1     Running       0               4m20s   10.104.16.160   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-bookie-init-hw76w                    0/1     Completed     0               4m21s   10.104.19.137   4am-node28   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-broker-0                             1/1     Running       0               4m21s   10.104.23.156   4am-node27   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-proxy-0                              1/1     Running       0               4m21s   10.104.19.138   4am-node28   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-pulsar-init-74c56                    0/1     Completed     0               4m21s   10.104.19.136   4am-node28   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-recovery-0                           1/1     Running       0               4m21s   10.104.16.147   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-zookeeper-0                          1/1     Running       0               4m21s   10.104.18.120   4am-node25   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-zookeeper-1                          1/1     Running       0               3m41s   10.104.20.110   4am-node22   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-zookeeper-2                          1/1     Running       0               2m58s   10.104.15.225   4am-node20   <none>           <none> (base.py:257)
[2023-10-24 05:29:03,188 -  INFO - fouram]: [Cmd Exe]  kubectl get pods  -n qa-milvus  -o wide | grep -E 'STATUS|fouramf-hqsbb-36-5136-milvus|fouramf-hqsbb-36-5136-minio|fouramf-hqsbb-36-5136-etcd|fouramf-hqsbb-36-5136-pulsar|fouramf-hqsbb-36-5136-kafka'  (util_cmd.py:14)
[2023-10-24 05:29:12,264 -  INFO - fouram]: [CliClient] pod details of release(fouramf-hqsbb-36-5136): 
I1024 05:29:04.435429     482 request.go:665] Waited for 1.159915509s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/certificates.k8s.io/v1?timeout=32s
NAME                                                              READY   STATUS                   RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
fouramf-hqsbb-36-5136-etcd-0                                      1/1     Running                  0                18h     10.104.18.122   4am-node25   <none>           <none>
fouramf-hqsbb-36-5136-etcd-1                                      1/1     Running                  0                18h     10.104.23.163   4am-node27   <none>           <none>
fouramf-hqsbb-36-5136-etcd-2                                      1/1     Running                  0                18h     10.104.16.158   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-milvus-datacoord-54c74794c9-6b5xv           1/1     Running                  0                18h     10.104.16.149   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-milvus-datanode-5f46b6dd95-jlqbj            1/1     Running                  0                18h     10.104.16.151   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-milvus-indexcoord-55b6fd5f88-9wj5q          1/1     Running                  0                18h     10.104.16.150   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-milvus-indexnode-5ddd6dd445-pdwtt           1/1     Running                  0                18h     10.104.24.67    4am-node29   <none>           <none>
fouramf-hqsbb-36-5136-milvus-proxy-5d456bd744-v8p6f               1/1     Running                  0                18h     10.104.16.152   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-milvus-querycoord-75d48fc58f-grvtt          1/1     Running                  0                18h     10.104.16.146   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-milvus-querynode-5ddc775ff-5f6nk            0/1     Error                    0                11h     10.104.20.52    4am-node22   <none>           <none>
fouramf-hqsbb-36-5136-milvus-querynode-5ddc775ff-5spk4            0/1     ContainerStatusUnknown   1                11h     10.104.21.161   4am-node24   <none>           <none>
fouramf-hqsbb-36-5136-milvus-querynode-5ddc775ff-fmcps            0/1     Error                    0                12h     10.104.19.207   4am-node28   <none>           <none>
fouramf-hqsbb-36-5136-milvus-querynode-5ddc775ff-hrbs7            1/1     Running                  0                11h     10.104.18.71    4am-node25   <none>           <none>
fouramf-hqsbb-36-5136-milvus-querynode-5ddc775ff-mlbkf            0/1     ContainerStatusUnknown   1                12h     10.104.19.208   4am-node28   <none>           <none>
fouramf-hqsbb-36-5136-milvus-querynode-5ddc775ff-wgtwc            0/1     Error                    0                18h     10.104.19.139   4am-node28   <none>           <none>
fouramf-hqsbb-36-5136-milvus-rootcoord-846bd64d7-4dnvt            1/1     Running                  0                18h     10.104.16.148   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-minio-0                                     1/1     Running                  0                18h     10.104.16.155   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-minio-1                                     1/1     Running                  0                18h     10.104.18.119   4am-node25   <none>           <none>
fouramf-hqsbb-36-5136-minio-2                                     1/1     Running                  0                18h     10.104.23.159   4am-node27   <none>           <none>
fouramf-hqsbb-36-5136-minio-3                                     1/1     Running                  0                18h     10.104.15.212   4am-node20   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-bookie-0                             1/1     Running                  0                18h     10.104.18.123   4am-node25   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-bookie-1                             1/1     Running                  0                18h     10.104.23.169   4am-node27   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-bookie-2                             1/1     Running                  0                18h     10.104.16.160   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-bookie-init-hw76w                    0/1     Completed                0                18h     10.104.19.137   4am-node28   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-broker-0                             1/1     Running                  0                18h     10.104.23.156   4am-node27   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-proxy-0                              1/1     Running                  0                18h     10.104.19.138   4am-node28   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-pulsar-init-74c56                    0/1     Completed                0                18h     10.104.19.136   4am-node28   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-recovery-0                           1/1     Running                  0                18h     10.104.16.147   4am-node21   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-zookeeper-0                          1/1     Running                  0                18h     10.104.18.120   4am-node25   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-zookeeper-1                          1/1     Running                  0                18h     10.104.20.110   4am-node22   <none>           <none>
fouramf-hqsbb-36-5136-pulsar-zookeeper-2                          1/1     Running                  0                18h     10.104.15.225   4am-node20   <none>           <none> (cli_client.py:132)

The querynode was evicted for using more than 100 gigabytes of disk.

Validated Peak Disk Usage Less Than 140 Gigabytes

stale · 2023-12-16T18:18:42Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

xiaofan-luan · 2023-12-17T13:55:46Z

@elstic
is this still a problem?

elstic · 2023-12-18T02:24:45Z

@elstic is this still a problem?

This issue has not arisen recently and I will close it.

sre-ci-robot · 2024-12-13T19:59:08Z

@nikcoderr: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

xiaofan-luan · 2024-12-14T10:56:04Z

Hi Actually I am using Milvus Standalone single node, how can I set the memory usage for query as I am facing issue for indexing 100M vector data. I am using DISKANN index type and using the default configuration in milvus.yaml file

... DiskIndex: MaxDegree: 56 SearchListSize: 100 PQCodeBugetGBRatio: 0.125 SearchCacheBudgetGBRatio: 0.125 BeamWidthRatio: 4.0 ...

I have used docker-compose to install milvus 2.4.x

I indexed the data but after indexing using describe_index function I am seeing {'index_type': 'IVF_SQ8', 'metric_type': 'L2', 'params': {'nlist': 1000}, 'field_name': 'emb', 'index_name': 'vector_index', 'total_rows': 100000000, 'indexed_rows': 100000000, 'pending_index_rows': 0, 'state': 'Finished'}

Also milvus was down ( assuming as connection was destroyed, i had to down it from docker and then docker-compose up.

Please help me resolve this issue. Thanks

when you create index, you have to specify you are using diskANN index, please check your code of you create index, I guess you are using IVFSQ8 index for now. The config on server is a default config for diskann, it only used if you create diskANN index without specifying default index params.
I don't thinks it's reasonable to run 100m(assuming it's 768 dim) vector search on single node. especially for diskANN. This need a node with more than 32core 128 GB memory and index build and failure recovery will be slow as well. IVFSQ8 is better at index build speed. Also you need high performance nvme SSD to run diskANN
I would recommend you check https://zilliz.com/pricing For our Serverless tier and Capacity optimized instance. But if you can't use fully managed service. I'd like to take a call and help you to setup if necessary

elstic added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. test/benchmark benchmark test labels Jun 27, 2023

elstic assigned yanliang567 Jun 27, 2023

elstic added this to the 2.2.11 milestone Jun 27, 2023

sre-ci-robot assigned xige-16 and unassigned yanliang567 Jun 27, 2023

yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 27, 2023

elstic changed the title ~~[Bug]: [benchmark]diskann index load failed after inserting 100 million data~~ [Bug]: [benchmark]diskann index failed to load after inserting 100 million data with excessive disk usage Jun 29, 2023

yanliang567 modified the milestones: 2.2.11, 2.2.12 Jul 3, 2023

yanliang567 modified the milestones: 2.2.12, 2.2.13 Aug 4, 2023

stale bot added the stale indicates no udpates for 30 days label Sep 19, 2023

elstic closed this as completed Sep 20, 2023

elstic changed the title ~~[Bug]: [benchmark]diskann index failed to load after inserting 100 million data with excessive disk usage~~ [Bug]: [benchmark] diskann index inserts 100 million data, querynode disk usage peaks at over 100G Oct 26, 2023

elstic modified the milestones: 2.2.13, 2.3.2 Oct 26, 2023

elstic reopened this Oct 26, 2023

elstic removed the stale indicates no udpates for 30 days label Oct 26, 2023

yanliang567 modified the milestones: 2.3.2, 2.3.3 Nov 7, 2023

yanliang567 modified the milestones: 2.3.3, 2.3.4 Nov 16, 2023

stale bot added the stale indicates no udpates for 30 days label Dec 16, 2023

stale bot removed the stale indicates no udpates for 30 days label Dec 17, 2023

elstic closed this as completed Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: [benchmark] diskann index inserts 100 million data, querynode disk usage peaks at over 100G #25163

[Bug]: [benchmark] diskann index inserts 100 million data, querynode disk usage peaks at over 100G #25163

elstic commented Jun 27, 2023 •

edited

Loading

yanliang567 commented Jun 27, 2023

xiaofan-luan commented Jun 27, 2023

xiaofan-luan commented Jun 27, 2023

elstic commented Jun 27, 2023

xiaofan-luan commented Jun 27, 2023

elstic commented Jun 30, 2023

xige-16 commented Jul 3, 2023

xige-16 commented Jul 3, 2023

LoveEachDay commented Jul 3, 2023

xiaofan-luan commented Jul 3, 2023

xige-16 commented Jul 4, 2023

elstic commented Jul 19, 2023

elstic commented Jul 26, 2023

xige-16 commented Aug 7, 2023

elstic commented Aug 8, 2023

smellthemoon commented Aug 8, 2023

elstic commented Aug 16, 2023

stale bot commented Sep 19, 2023

elstic commented Sep 20, 2023

elstic commented Oct 26, 2023 •

edited

Loading

stale bot commented Dec 16, 2023

xiaofan-luan commented Dec 17, 2023

elstic commented Dec 18, 2023

sre-ci-robot commented Dec 13, 2024

xiaofan-luan commented Dec 14, 2024

[Bug]: [benchmark] diskann index inserts 100 million data, querynode disk usage peaks at over 100G #25163

[Bug]: [benchmark] diskann index inserts 100 million data, querynode disk usage peaks at over 100G #25163

Comments

elstic commented Jun 27, 2023 • edited Loading

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

yanliang567 commented Jun 27, 2023

xiaofan-luan commented Jun 27, 2023

xiaofan-luan commented Jun 27, 2023

elstic commented Jun 27, 2023

xiaofan-luan commented Jun 27, 2023

elstic commented Jun 30, 2023

xige-16 commented Jul 3, 2023

xige-16 commented Jul 3, 2023

LoveEachDay commented Jul 3, 2023

xiaofan-luan commented Jul 3, 2023

xige-16 commented Jul 4, 2023

elstic commented Jul 19, 2023

elstic commented Jul 26, 2023

xige-16 commented Aug 7, 2023

elstic commented Aug 8, 2023

smellthemoon commented Aug 8, 2023

elstic commented Aug 16, 2023

stale bot commented Sep 19, 2023

elstic commented Sep 20, 2023

elstic commented Oct 26, 2023 • edited Loading

stale bot commented Dec 16, 2023

xiaofan-luan commented Dec 17, 2023

elstic commented Dec 18, 2023

sre-ci-robot commented Dec 13, 2024

xiaofan-luan commented Dec 14, 2024

elstic commented Jun 27, 2023 •

edited

Loading

elstic commented Oct 26, 2023 •

edited

Loading