-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EventLoop: store Timers in min Pairing Heap [fixup #14996] #15206
EventLoop: store Timers in min Pairing Heap [fixup #14996] #15206
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't comment on how it is used, but I have spent a bit of time polishing a paired heap, so some comments on that..
src/crystal/pointer_pairing_heap.cr
Outdated
end | ||
|
||
# Twopass merge of the children of *node* into pairs of two. | ||
private def merge_pairs(a : T*) : T* | Nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How big heaps are allowed? This implementation will create a runtime stack n
elements deep where n
is the amount of elements added since the latest shift?
or delete
. That is possibly not an issue if it is hundreds of elements, but with many thousands of elements queued at the same time may become a problem. Compare https://github.com/yxhuvud/pairing_heap/blob/master/src/pairing_heap.cr#L95 which also is a bit more explicit in the code about what the two faces are (the two while loops).
(fun fact: This is where almost all the time of the algorithm is spent - the algorithm has decent throughput, implementation complexity and time complexities, but it do suffer a bit from latency spikes, which can happen if a lot is enqueued at once)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this is the current issue. With one million timers enqueued at once, no delete and thus no rebalance, it fills the 8KB of my thread main stack.
Thanks for the help, I'll look into it. It's much simpler!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thank you @yxhuvud 🙇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, thank you. Only thing I'm not happy about here is that it won't be in a released version of crystal before advent of code starts, meaning I still will have to copy-paste my copy of it into my solutions :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where almost all the time of the algorithm is spent
I'm wondering if the auxiliary insertion buffer wouldn't help. Of course not by itself, since it would behave exactly the same (twopass + merge on delete/min), but what if the buffer has a maximum size, so it doesn't grow too big? 🤔
Simpler PointerPairingHeap: let the Timers object deal with knowing whether we added to HEAD or actually deleted a timer, etc. Fix potential stack-overflow when merging pairs after inserting _lots_ of timers (over 1 million fills a 8KB thread stack).
586496e
to
4a63e07
Compare
There's something odd happening on macOS. The specs don't seem to start or something hangs and we don't get reports on the screen 😢 Both macOS CI are sitting until the job times out. |
…evloop+block-arena+pairing-heap
Should this have been merged before the rfc? |
Dunno, but there are at least some existing instances of where the pairing heap can be useful in the existing code, like for example at https://github.com/crystal-lang/crystal/blob/master/src/crystal/system/win32/event_loop_iocp.cr#L8 . (though it seems to have an inclusion check that may be new since I saw that one last. hmm) |
This is an MVP implementation that's dearly necessary to fix some heavy performance issues. It's a relatively simple implementation that plugs right in without any conceptual changes. RFC 0012 considers the next steps on how we could improve timer stores even further in the future. |
Related to RFC #0012.
Replaces the
Deque
used in #14996 for a min Pairing Heap which is a kind of Mergeable Heap and is one of the best performing heap in practical tests when arbitrary deletions are required (think cancelling a timeout), otherwise a D-ary Heap (e.g. 4-heap) will usually perform better. See the A Nearly-Tight Analysis of Multipass Pairing Heaps paper or the Wikipedia page for more details.The implementation itself is based on the Pairing Heaps: Experiments and Analysis paper, and merely implements a recursive twopass algorithm (the auxiliary twopass might perform even better). The
Crystal::PointerPairingList(T)
type is generic and relies on intrusive nodes (the links are intoT
) to avoid extra allocations for the nodes (same asCrystal::PointerLinkedList(T)
). It also requires aT#heap_compare
method, so we can use the same type for a min or max heap, or to build a more complex comparison.Note: I also tried a 4-heap, and while it performs very well and only needs a flat array, the arbitrary deletion (e.g. cancelling timeout) needs a linear scan and its performance quickly plummets, even at low occupancy, and becomes painfully slow at higher occupancy (tens of microseconds on each delete, while the pairing heap does it in tens of nanoseconds).
Follow up to #14996
Builds on top of #15205, you should open the last commit for just the pairing heap.