Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: memory leak from policy registration #1687

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rodion-lim-partior
Copy link
Contributor

@rodion-lim-partior rodion-lim-partior commented Dec 11, 2023

Start a QBFT/IBFT cluster with an arbitrary number of non-validator nodes. Memory usage of non-validator nodes grows indefinitely, causing OOM eventually (Reference issue #1660). Issue can easily be reproduced by logging out the length of validator sets within the policy's registry at runtime.

Non validator nodes calls RegisterValidatorSet method for every single block, adding on to the in memory ProposerPolicy.registry. GC is unable to free the memory since ClearRegistry is never called. There are cases when a restart of the quorum node causes even validator nodes to indefinitely add new validator sets to the registry.

Registry is currently not used in any of the other implementation code, this PR sets a cap to the maximum number of validator sets allowed in memory at any one point in time (in the event when registry is planned for future use cases). It caps the registry up to the latest 128 blocks, aligned to the number of in-memory state transitions. Older registrations are sequentially booted out of the registry.

@rodion-lim-partior rodion-lim-partior changed the title fix: memory leak from policy registration (#1660) fix: memory leak from policy registration Dec 11, 2023
@frank-lim-partior
Copy link
Contributor

@YoshihitoAso, this is possibly related to the issue you raised in BoostryJP/quorum#56

@cheeweng-tan-partior
Copy link

cheeweng-tan-partior commented Dec 13, 2023

After implementing the fix, we conducted a thorough profiling analysis on the non-validator nodes. The observed memory behavior now indicates a significant improvement, with memory consumption stabilizing instead of exhibiting indefinite growth, as previously observed.

Please refer to the attached memory consumption data for the past 3 days.
image

We used pprof snapshots to compare the memory usage before and after the fix. Here are the key observation:

  • Before the Fix: The pprof snapshot distinctly reveals that the predominant portion of memory consumption stemmed from 'validator.newDefaultSet'.
    image

  • After the Fix: Upon re-profiling, 'validator.newDefaultSet' no longer appears in the top 20 memory consumers. Instead, only 'leveldb' remains as one of the main top 5 memory consumers, which is expected during block minting.
    image

In summary:
The fix has effectively addressed memory leakage that was observed, notably, 'validator.newDefaultSet' no longer contributes to excessive memory consumption post-fix, aligning with expected behavior. These observations highlight the efficacy of the fix to show a positive results in mitigating memory leakage and enhancing overall system stability / reliability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants