-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: milvus rootcoord always crashes and restarts, the error message is oomkill, the memory has been allocated to 48G #35230
Comments
are you sure it's coordinator reach memory limit? from the log querynode-11298 is full of memory and this numbers shows that some node reboot a lot of times. If you do say coordinators have so much memory usage, please offer a pprof file of rootcoord so we can do futher analysis |
I don't think it is possible for rootcoord to have a lot of memory in a short period of time. |
from the log, rootcoord works find without seeing any problems |
@TonyAnn it looks like you have created thousands of collections in the cluster, which could be already improved in latest milvus relase. could you please retry on milvus 2.4.7 or 2.3.20 |
even with 10k collections, the memory usage is not expected? |
we need pprof to understand why. maybe it's due to some kind of retry |
pprof.milvus.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz |
|
@TonyAnn also 2.2.16 is a very old version. There might be some optimization related to this issue in 2.3.x and 2.4.x. It highly recommended to upgrade you cluster to a more stable version |
@congqixia There are a large number of collections in the cluster. There is a liveness detection script that continuously creates and deletes collections. This script may be the cause. |
OK, it will be upgraded in a while. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Is there an existing issue for this?
Environment
Current Behavior
milvus rootcoord always crashes and restarts, the error message is oomkill, the memory has been allocated to 48G
Expected Behavior
The detailed rootcoord log is shown in the attachment
rootcoord_280160cbb3b-json.log
rootcoord_json.log
rootcoord_280160cbb3b-json.log
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: