-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: [Nightly] Load collection occasionally failed for timeout #37272
Comments
/assign @czs007 |
/assign @bigsheeper |
same as #30301 |
@bigsheeper Plz assign this back to verify |
/assign @NicoYuan1986 should be fixed, please help to make verify |
@bigsheeper Can you help take a look? |
@SimFG The switch for concurrent DDL execution is not enabled: |
/assign @NicoYuan1986 |
not reproduce in https://jenkins.milvus.io:18080/blue/rest/organizations/jenkins/pipelines/Milvus%20Nightly%20CI(new)/branches/master/runs/180/nodes/66/steps/137/log/?start=0 |
reproduce: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI(new)/detail/master/182/pipeline/114 |
The nightly CI uses only one database, and when create collection is slow (p99 reaching 13 seconds), it still blocks operations like showPartition, causing timeouts. PR #37352 can not optimize this scenarios. Perhaps we should investigate why create collection is so slow. Any comment? @xiaofan-luan @SimFG |
@SimFG @bigsheeper @SimFG could you help on analyze the reason of why create collection become slow? |
cool work dude
…On Fri, 15 Nov 2024 at 20:44 Xiaofan ***@***.***> wrote:
@SimFG <https://github.com/SimFG> @bigsheeper
<https://github.com/bigsheeper>
I think there is definitely a regression here because we didn't see any
problems here before.
@SimFG <https://github.com/SimFG> could you help on analyze the reason of
why create collection become slow?
—
Reply to this email directly, view it on GitHub
<#37272 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZDLW276PB6FRV6I6V65NXL2A2BNHAVCNFSM6AAAAABQZULUH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBQGE2DIOBUHE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
We've been working on the collection number and partition number optimization, stay tuned! |
/assign @bigsheeper |
func (t *createCollectionTask) GetLockerKey() LockerKey { I there a special reason that create collection need database level write key? |
mean time, 10s might be too long for create collection, need to investigate on it. |
We analyzed a create collection process with a trace ID of |
Too many collections leads to too much meta data. Sounds not 2.5 blocking issue. |
which environment it is? |
do we have disk util monitoring? |
I know pingcap is doing 50GB etcd and they don't seems to see issues like us. Maybe we should try to optimize some etcd params see if it helps |
iops is not high. but this is milvus disk, not etcd. |
@xiaofan-luan I checked another case of drop collection timeout failure and found that the fsync delay was also high. Then I checked the monitoring on the physical machine where etcd is located and found that the io seemed to be relatively high during this period. |
Checking etcd metrics |
@liliu-z I think it's consistent |
maybe we should avoid deploy etcd together with minio. |
/assign @yanliang567 |
/unassign @SimFG |
not reproduce for these days. close it now. |
Is there an existing issue for this?
Environment
Current Behavior
Load collection occasionally failed for timeout.
Expected Behavior
pass
Steps To Reproduce
No response
Milvus Log
Anything else?
No response
The text was updated successfully, but these errors were encountered: