Bug (compactor): Investigate compactor oom bug and improve memory allocation #15022

Li0k · 2024-02-06T06:18:38Z

Background

The phenomenon of compactor oom occurs during the compactor-test.

We found that when the test is first started, compactor's memory spikes and experiences oom, and stabilizes after a while.

Investigate

I turned on the memory profiler and reran the test, and found a few things.

After the introduction of check_compact_result, it was taking up some of the memory. I found out that the memory tracker is not maintained during the execution of check_compact_result, which could lead to a memory overflow.

Unfortunately, this only delayed the appearance of oom and did not solve the problem. so continued to analyze the metrics and the profile.There was some interesting information in the metrics that helped me investigate the cause of the problem: Why is the oom only appearing at startup? A simple way to think about it is to look for fluctuations in the same time.

multi version key

k-v pair size

epoch

According to the above metrics, we can find that in the startup phase of the test (10min), min_epoch is pinned, which leads to a large number of multi-version keys in the system, and the size of k-v pairs in the test is larger than that of a normal nexmark test.

Analyzing further, based on the profile results, we can locate the problem in the encode and decode phases of the block.
1.

From the profile analysis, BlockBuilder::add consumes a lot of memory, based on the metrics above, it may be due to the effect of multiple key versions. Reading the code further, after 8136111, the builder tends to place the same user_key in the block.

TLDR: the limit of is_new_user_key could lead to huge blocks under the current test, which are not covered by the memory estimation algorithm, and trigger more memory allocations.

The text was updated successfully, but these errors were encountered:

Li0k · 2024-02-06T06:53:17Z

Improvement

Based on the information in the Memory profile, we found some points that can be optimized.

BlockBuilder::Add

BlockBuilder preallocates the buf at startup to minimize secondary memory allocation.

There are two problems:

the extra 256 allocated may not be enough after the introduction of the type index, and a large number of keys will trigger an allocation of
During compress, the buf variable is changed, invalidating the expected pre-allocation.

BlockBuilder::compress

compress function will generate a new BytesMut write each time for encoding

Block::decode_with_copy

~~Crete Bytes from Vec lead to memory reallocation.~~

    /// Converts the vector into [`Box<[T]>`][owned slice].
    ///
    /// If the vector has excess capacity, its items will be moved into a
    /// newly-allocated buffer with exactly the right capacity.
    #[cfg(not(no_global_oom_handling))]
    #[stable(feature = "rust1", since = "1.0.0")]
    pub fn into_boxed_slice(mut self) -> Box<[T], A> {
        unsafe {
            self.shrink_to_fit();
            let me = ManuallyDrop::new(self);
            let buf = ptr::read(&me.buf);
            let len = me.len();
            buf.into_box(len).assume_init()
        }
    }

Change

reserve more space for type index
change the condition for determining block full to avoid block size exceeding capacity as much as possible.
Introduce a dedicated compress buf
~~4. refactor the usage of Bytes~~

Result

hzxa21 · 2024-02-18T09:19:20Z

From the profile analysis, BlockBuilder::add consumes a lot of memory, based on the metrics above, it may be due to the effect of multiple key versions. Reading the code further, after 8136111, the builder tends to place the same user_key in the block.

What is the benefit of putting all versions of a user key into a single block? I guess it is a bug (fixed by #15023)?

Li0k added the type/feature label Feb 6, 2024

github-actions bot added this to the release-1.7 milestone Feb 6, 2024

Li0k added type/bug Something isn't working component/storage Storage labels Feb 6, 2024

Li0k self-assigned this Feb 6, 2024

This was referenced Feb 6, 2024

fix(storage): fix compactor oom 0206 #15023

Merged

feat(storage): improve block memory usage #15024

Merged

Li0k closed this as completed Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug (compactor): Investigate compactor oom bug and improve memory allocation #15022

Bug (compactor): Investigate compactor oom bug and improve memory allocation #15022

Li0k commented Feb 6, 2024 •

edited

Loading

Li0k commented Feb 6, 2024 •

edited

Loading

hzxa21 commented Feb 18, 2024 •

edited

Loading

Bug (compactor): Investigate compactor oom bug and improve memory allocation #15022

Bug (compactor): Investigate compactor oom bug and improve memory allocation #15022

Comments

Li0k commented Feb 6, 2024 • edited Loading

Background

Investigate

Li0k commented Feb 6, 2024 • edited Loading

Improvement

Change

Result

hzxa21 commented Feb 18, 2024 • edited Loading

Li0k commented Feb 6, 2024 •

edited

Loading

Li0k commented Feb 6, 2024 •

edited

Loading

hzxa21 commented Feb 18, 2024 •

edited

Loading