-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rss memory increasing when writing data, and caused node Out Of Memory. #1487
Comments
@IAmFQQ Did you try jemalloc instead of tcmalloc? jemalloc we have seen make significant improvements on memory fragmentation. We havent benchmarked on impact of forcemerge time on it yet. But if possible, it would be good to get data point Additionally, faiss hnsw can be tried. In the implementation, there are fewer, smaller allocs made. Related issues: |
thanks @jmazanec15 for your suggestions, very much appreciated. We will have the testing and get back to you in next week. Test Cases,
|
Sounds good. Yes, please let us know. |
Hey @IAmFQQ were you able to test? |
hi @jmazanec15 We will have the test report by the end of this week and will share with you. |
hi @jmazanec15 We tested the nmslib hnsw using jemalloc, the forcemerge time did not drop too much, the same as we using tcmalloc. To use faiss hnsw, we have to change the space_type from cosinesimil to l2, we have not tested it yet. It requires a reassessment of the scores. Our upstream application needs to evaluate it. I will keep you posted. |
@IAmFQQ thanks
Im trying to understand why this might be the case. Any ideas? Ill try to run an experiment locally. |
I am also very curious to understand how this could impact forcemerge time as in how the strategy of reclaiming free space impacts the graph build time. Is it because we are running in constrained environment and there are pauses to free up space? Any thing interesting you noticed in the logs will be helpful |
@vamshin @jmazanec15 However, our regular statistics tool only counts the last time, which leads to the counting error. Thanks for your help @jmazanec15 . So the solution to avoid OOM is to use tcmallo/jemalloc. |
@IAmFQQ Oh good to hear. Did you notice any diff between tcmalloc and jemalloc? |
Also, I ran some micro-benchmarks based on https://github.com/jmazanec15/k-NN-1/blob/micro-benchmarks/micro-benchmarks/src/main/java/org/opensearch/knn/BuildNativeIndexBenchmarks.java and found that for 250K vectors, jemalloc makes building graphs a little bit faster # Default malloc
Benchmark (dimension) (efConstruction) (engine) (indexThreadQty) (m) (spaceType) Mode Cnt Score Error Units
BuildNativeIndexBenchmarks.buildNativeIndex 128 100 nmslib 4 16 innerproduct ss 4 67.803 ? 1.715 s/op
BuildNativeIndexBenchmarks.buildNativeIndex 128 100 nmslib 4 16 l2 ss 4 70.823 ? 1.086 s/op
BuildNativeIndexBenchmarks.buildNativeIndex 512 100 nmslib 4 16 innerproduct ss 4 176.794 ? 0.491 s/op
BuildNativeIndexBenchmarks.buildNativeIndex 512 100 nmslib 4 16 l2 ss 4 169.535 ? 3.413 s/op
# Jemalloc
Benchmark (dimension) (efConstruction) (engine) (indexThreadQty) (m) (spaceType) Mode Cnt Score Error Units
BuildNativeIndexBenchmarks.buildNativeIndex 128 100 nmslib 4 16 innerproduct ss 4 62.201 ? 0.763 s/op
BuildNativeIndexBenchmarks.buildNativeIndex 128 100 nmslib 4 16 l2 ss 4 66.505 ? 0.935 s/op
BuildNativeIndexBenchmarks.buildNativeIndex 512 100 nmslib 4 16 innerproduct ss 4 170.340 ? 2.244 s/op
BuildNativeIndexBenchmarks.buildNativeIndex 512 100 nmslib 4 16 l2 ss 4 164.709 ? 5.022 s/op Checking to see if same applies for faiss. |
FaissAbout the same
|
Closing no activity |
What is the bug?
Node OOM when writing data to KKN index.
RSS memory monitoring
How can one reproduce the bug?
KNN field mappings
index settings
one index
cluster settings
The above index/cluster settings is to reduce the number of segments generated, less segments could help us reduce the force merge time. And less segments would also benefit the KNN searching performance.
We get it from L&P test.
What is the expected behavior?
We daily roll over our index and delete them, keep 3 indexes.
But the rss memory (from top or pmap) keep increasing day by day and eventually the node would be down by OOM.
This is the monitoring of the rss memory, match the output of TOP command.
With the above index settings, the flush or the fsync and commit in the background every 300s or the index_buffer_size is full. After writing the data, we will submit a force merge request.
From the rss memory trend, we can see that the rss memory will gradually grow each when writing to tranlsog. During force merge, there is a certain decrease but not much, the overall trend is upward.
On 2-18, the 3rd day, when we pushed data to the new index(writing to translog), the node was crashsed by OOM.
Java heap was not over used. This is the monitoring of the java heap
What is your host/environment?
many 64MB and 128MB memory blocks.
We thought it might be the same to #772.
So we used the code of nmslib_wrapper.cpp from 2.8 to replace the same file in 2.7
refer to https://github.com/opensearch-project/k-NN/blame/f11f1f1d4ad0de76b05517b57bcc87e0a6788031/jni/src/nmslib_wrapper.cpp#L129
And rebuild the OS&KNN image and deploy
The issue is not gone, the nodes are still OOM
Finally , we used
tcmalloc
to replaceptmalloc
in glibc, the OOM is gone but the formerge time of the index increased from 1h to 4h. It's not acceptable.Our questions?
The text was updated successfully, but these errors were encountered: