You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The phenomenon of compactor oom occurs during the compactor-test.
We found that when the test is first started, compactor's memory spikes and experiences oom, and stabilizes after a while.
Investigate
I turned on the memory profiler and reran the test, and found a few things.
After the introduction of check_compact_result, it was taking up some of the memory. I found out that the memory tracker is not maintained during the execution of check_compact_result, which could lead to a memory overflow.
Unfortunately, this only delayed the appearance of oom and did not solve the problem. so continued to analyze the metrics and the profile.There was some interesting information in the metrics that helped me investigate the cause of the problem: Why is the oom only appearing at startup? A simple way to think about it is to look for fluctuations in the same time.
multi version key
k-v pair size
epoch
According to the above metrics, we can find that in the startup phase of the test (10min), min_epoch is pinned, which leads to a large number of multi-version keys in the system, and the size of k-v pairs in the test is larger than that of a normal nexmark test.
Analyzing further, based on the profile results, we can locate the problem in the encode and decode phases of the block.
1.
From the profile analysis, BlockBuilder::add consumes a lot of memory, based on the metrics above, it may be due to the effect of multiple key versions. Reading the code further, after 8136111, the builder tends to place the same user_key in the block.
TLDR: the limit of is_new_user_key could lead to huge blocks under the current test, which are not covered by the memory estimation algorithm, and trigger more memory allocations.
The text was updated successfully, but these errors were encountered:
Based on the information in the Memory profile, we found some points that can be optimized.
BlockBuilder::Add
BlockBuilder preallocates the buf at startup to minimize secondary memory allocation.
There are two problems:
the extra 256 allocated may not be enough after the introduction of the type index, and a large number of keys will trigger an allocation of
During compress, the buf variable is changed, invalidating the expected pre-allocation.
BlockBuilder::compress
compress function will generate a new BytesMut write each time for encoding
Block::decode_with_copy
Crete Bytes from Vec lead to memory reallocation.
/// Converts the vector into [`Box<[T]>`][owned slice].////// If the vector has excess capacity, its items will be moved into a/// newly-allocated buffer with exactly the right capacity.#[cfg(not(no_global_oom_handling))]#[stable(feature = "rust1", since = "1.0.0")]pubfninto_boxed_slice(mutself) -> Box<[T],A>{unsafe{self.shrink_to_fit();let me = ManuallyDrop::new(self);let buf = ptr::read(&me.buf);let len = me.len();
buf.into_box(len).assume_init()}}
Change
reserve more space for type index
change the condition for determining block full to avoid block size exceeding capacity as much as possible.
Introduce a dedicated compress buf 4. refactor the usage of Bytes
From the profile analysis, BlockBuilder::add consumes a lot of memory, based on the metrics above, it may be due to the effect of multiple key versions. Reading the code further, after 8136111, the builder tends to place the same user_key in the block.
What is the benefit of putting all versions of a user key into a single block? I guess it is a bug (fixed by #15023)?
Background
The phenomenon of compactor oom occurs during the compactor-test.
We found that when the test is first started, compactor's memory spikes and experiences oom, and stabilizes after a while.
Investigate
I turned on the memory profiler and reran the test, and found a few things.
After the introduction of check_compact_result, it was taking up some of the memory. I found out that the memory tracker is not maintained during the execution of check_compact_result, which could lead to a memory overflow.
Unfortunately, this only delayed the appearance of oom and did not solve the problem. so continued to analyze the metrics and the profile.There was some interesting information in the metrics that helped me investigate the cause of the problem: Why is the oom only appearing at startup? A simple way to think about it is to look for fluctuations in the same time.
According to the above metrics, we can find that in the startup phase of the test (10min), min_epoch is pinned, which leads to a large number of multi-version keys in the system, and the size of k-v pairs in the test is larger than that of a normal nexmark test.
Analyzing further, based on the profile results, we can locate the problem in the encode and decode phases of the block.
1.
From the profile analysis, BlockBuilder::add consumes a lot of memory, based on the metrics above, it may be due to the effect of multiple key versions. Reading the code further, after 8136111, the builder tends to place the same user_key in the block.
TLDR: the limit of is_new_user_key could lead to huge blocks under the current test, which are not covered by the memory estimation algorithm, and trigger more memory allocations.
The text was updated successfully, but these errors were encountered: