Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize garbage collection #80

Open
domenicquirl opened this issue Mar 28, 2020 · 4 comments
Open

Optimize garbage collection #80

domenicquirl opened this issue Mar 28, 2020 · 4 comments
Labels
discussion Looking for input / ideas on this issue enhancement New feature or request performance This issue targets performance improvements

Comments

@domenicquirl
Copy link
Collaborator

In the discussion about performance improvements (#50), a point that arose repeatedly is the impact of the crossbeam_epoch-based garbage collection we use to track map entries (nodes and values).

There are several angles from which we might look at reducing this impact. This issue is meant to be for discussion and changes around how and when our code interacts with garbage collection. Ideas floated in #72 (review) and previous comments may serve as a starting point. They include:

  • moving direct interactions with crossbeam_epoch, e.g. in the form of defer_destroy, out of critical sections in favour of a simpler collection mechanism for things to be deleted. The collected entities could then be handed of to garbage collection after dropping the lock for the critical section.
    It is yet unclear whether this would yield a significant improvement over calling defer_destroy directly from within the critical section (as happens currently), as crossbeam essentially also stores a closure for the drop and takes some measures to optimize the storage of these closures (for example, a deferred closure only requires a heap allocation if it exceeds a certain size).
  • optimizing defer_destroy and its usage itself. See again Implement TreeBins #72 (review)
@domenicquirl domenicquirl added enhancement New feature or request discussion Looking for input / ideas on this issue performance This issue targets performance improvements labels Mar 28, 2020
@jonhoo
Copy link
Owner

jonhoo commented Mar 28, 2020

Another thing to look into is if we an make fewer calls to defer_destroy by batching up all the garbage any given method call accumulates, and then freeing all of it in one defer closure instead.

@ibraheemdev
Copy link
Collaborator

#102 addresses the performance issues, but it also means that each value now comes along with some extra memory required for the intrusive linked lists used by seize. It would be possible to delay allocating some of the memory until a value is actually retired, but that then means that retire has to allocate, and there are is some extra pointer indirection for every thread that decrements the reference count of a retired batch. The final thread will also have to go through the pointer of every value in the batch. This should reduce memory usage in general, but will likely have a significant performance impact for delete heavy workloads.

@ibraheemdev
Copy link
Collaborator

ibraheemdev commented Mar 2, 2022

I ran some tests with Linked<T> removed all together and saw pretty significant performance boosts (~%30) in read heavy workloads, with diminishing returns as thread counts rose, eventually meeting at $NUM_CPUS. In a mixed insert/delete workload there was a constant overhead of ~20% for the extra alloc/dealloc when retiring. This could likely be amortized by reusing list nodes.

@ibraheemdev
Copy link
Collaborator

ibraheemdev commented Mar 29, 2022

All right, so here is what I am thinking.

  • Collectors store a list of pre-allocated nodes. To avoid frequent contention on this list, nodes are grouped into batches (the same batches that threads currently use to collector retired values before retirement).
  • When values are retired, threads try the following steps in order of priority:
    1. Take a node from a local batch.
    2. Take a batch from the global list, and ^^^.
    3. Allocate a node.
  • When batches are reclaimed, threads add them to the global list for reuse.

Batch reuse would be optional. If a read-heavy program cares more about memory usage than allocation overhead, it can opt-out. Retirement will then always involve allocating a node, and reclamation will involve an extra dealloc.

Another option would be for the thread that reclaims a batch to first try to take it for itself, before returning it to the collector. However, I'm not sure this optimization is worthwhile as reader threads may end up taking batches that they never end up using, and writer threads are likely to already have taken a batch.

Using this strategy, Linked<T> will only need to reserve an extra usize to record the current epoch value, as opposed to storing the node inline. I would like to make it possible to optionally create nodes on link (as is the current behavior), but I'm not sure this would be possible without some funky generics, and the scheme outlined above seems pretty good :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Looking for input / ideas on this issue enhancement New feature or request performance This issue targets performance improvements
Projects
None yet
Development

No branches or pull requests

3 participants