Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug (compactor): Investigate compactor oom bug and improve memory allocation #15022

Closed
Li0k opened this issue Feb 6, 2024 · 2 comments
Closed
Assignees
Labels
component/storage Storage type/bug Something isn't working type/feature
Milestone

Comments

@Li0k
Copy link
Contributor

Li0k commented Feb 6, 2024

Background

The phenomenon of compactor oom occurs during the compactor-test.
image

We found that when the test is first started, compactor's memory spikes and experiences oom, and stabilizes after a while.

Investigate

I turned on the memory profiler and reran the test, and found a few things.
image

After the introduction of check_compact_result, it was taking up some of the memory. I found out that the memory tracker is not maintained during the execution of check_compact_result, which could lead to a memory overflow.

image

Unfortunately, this only delayed the appearance of oom and did not solve the problem. so continued to analyze the metrics and the profile.There was some interesting information in the metrics that helped me investigate the cause of the problem: Why is the oom only appearing at startup? A simple way to think about it is to look for fluctuations in the same time.

  1. multi version key
image
  1. k-v pair size
image
  1. epoch
image

According to the above metrics, we can find that in the startup phase of the test (10min), min_epoch is pinned, which leads to a large number of multi-version keys in the system, and the size of k-v pairs in the test is larger than that of a normal nexmark test.

Analyzing further, based on the profile results, we can locate the problem in the encode and decode phases of the block.
1.
image

image

image

From the profile analysis, BlockBuilder::add consumes a lot of memory, based on the metrics above, it may be due to the effect of multiple key versions. Reading the code further, after 8136111, the builder tends to place the same user_key in the block.

image

TLDR: the limit of is_new_user_key could lead to huge blocks under the current test, which are not covered by the memory estimation algorithm, and trigger more memory allocations.

@github-actions github-actions bot added this to the release-1.7 milestone Feb 6, 2024
@Li0k Li0k added type/bug Something isn't working component/storage Storage labels Feb 6, 2024
@Li0k Li0k self-assigned this Feb 6, 2024
@Li0k
Copy link
Contributor Author

Li0k commented Feb 6, 2024

Improvement

Based on the information in the Memory profile, we found some points that can be optimized.

  1. BlockBuilder::Add

image

BlockBuilder preallocates the buf at startup to minimize secondary memory allocation.

image

There are two problems:

  • the extra 256 allocated may not be enough after the introduction of the type index, and a large number of keys will trigger an allocation of

  • During compress, the buf variable is changed, invalidating the expected pre-allocation.

image
  1. BlockBuilder::compress

image

compress function will generate a new BytesMut write each time for encoding

image
  1. Block::decode_with_copy

image

Crete Bytes from Vec lead to memory reallocation.

    /// Converts the vector into [`Box<[T]>`][owned slice].
    ///
    /// If the vector has excess capacity, its items will be moved into a
    /// newly-allocated buffer with exactly the right capacity.
    #[cfg(not(no_global_oom_handling))]
    #[stable(feature = "rust1", since = "1.0.0")]
    pub fn into_boxed_slice(mut self) -> Box<[T], A> {
        unsafe {
            self.shrink_to_fit();
            let me = ManuallyDrop::new(self);
            let buf = ptr::read(&me.buf);
            let len = me.len();
            buf.into_box(len).assume_init()
        }
    }
image

Change

  1. reserve more space for type index
  2. change the condition for determining block full to avoid block size exceeding capacity as much as possible.
  3. Introduce a dedicated compress buf
    4. refactor the usage of Bytes

Result

image
image
image

@hzxa21
Copy link
Collaborator

hzxa21 commented Feb 18, 2024

From the profile analysis, BlockBuilder::add consumes a lot of memory, based on the metrics above, it may be due to the effect of multiple key versions. Reading the code further, after 8136111, the builder tends to place the same user_key in the block.

What is the benefit of putting all versions of a user key into a single block? I guess it is a bug (fixed by #15023)?

@Li0k Li0k closed this as completed Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/storage Storage type/bug Something isn't working type/feature
Projects
None yet
Development

No branches or pull requests

2 participants