-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Longevity test CN and Meta OOM nightly-20240201 #14944
Comments
Meta OOM killed during recovery right after the compute-0 restarts, I will take a look later. |
Also noticed that the average source throughput (5.52 MB/s) is worse than |
For the meta OOM part, I found that the meta OOM is caused by the default enablement of auto scaling. Currently, when checking and generating scale plans at the beginning of recovery, auto scaling will list all table fragments twice, which leads to a threefold increase in memory allocation for this part. @shanicky is writing a PR to avoid these two copies and add some necessary checks to fix this issue. FYI: you can find all memory dump files here including compute node. |
Current progress, we found the memory metrics collected by jemalloc is 2-3GB smaller than the cluster node memory. |
This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned. |
The text was updated successfully, but these errors were encountered: