Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EventLoop: store Timers in min Pairing Heap [fixup #14996] #15206

Conversation

ysbaddaden
Copy link
Contributor

@ysbaddaden ysbaddaden commented Nov 19, 2024

Related to RFC #0012.

Replaces the Deque used in #14996 for a min Pairing Heap which is a kind of Mergeable Heap and is one of the best performing heap in practical tests when arbitrary deletions are required (think cancelling a timeout), otherwise a D-ary Heap (e.g. 4-heap) will usually perform better. See the A Nearly-Tight Analysis of Multipass Pairing Heaps paper or the Wikipedia page for more details.

The implementation itself is based on the Pairing Heaps: Experiments and Analysis paper, and merely implements a recursive twopass algorithm (the auxiliary twopass might perform even better). The Crystal::PointerPairingList(T) type is generic and relies on intrusive nodes (the links are into T) to avoid extra allocations for the nodes (same as Crystal::PointerLinkedList(T)). It also requires a T#heap_compare method, so we can use the same type for a min or max heap, or to build a more complex comparison.

Note: I also tried a 4-heap, and while it performs very well and only needs a flat array, the arbitrary deletion (e.g. cancelling timeout) needs a linear scan and its performance quickly plummets, even at low occupancy, and becomes painfully slow at higher occupancy (tens of microseconds on each delete, while the pairing heap does it in tens of nanoseconds).

Follow up to #14996

Builds on top of #15205, you should open the last commit for just the pairing heap.

@ysbaddaden ysbaddaden changed the title Store Timers in Min Pairing Heap (lifetime Event Loop) EventLoop: store Timers in min Pairing Heap Nov 19, 2024
@ysbaddaden ysbaddaden self-assigned this Nov 19, 2024
Copy link
Contributor

@yxhuvud yxhuvud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't comment on how it is used, but I have spent a bit of time polishing a paired heap, so some comments on that..

src/crystal/pointer_pairing_heap.cr Show resolved Hide resolved
src/crystal/pointer_pairing_heap.cr Show resolved Hide resolved
src/crystal/pointer_pairing_heap.cr Show resolved Hide resolved
end

# Twopass merge of the children of *node* into pairs of two.
private def merge_pairs(a : T*) : T* | Nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big heaps are allowed? This implementation will create a runtime stack n elements deep where n is the amount of elements added since the latest shift? or delete. That is possibly not an issue if it is hundreds of elements, but with many thousands of elements queued at the same time may become a problem. Compare https://github.com/yxhuvud/pairing_heap/blob/master/src/pairing_heap.cr#L95 which also is a bit more explicit in the code about what the two faces are (the two while loops).

(fun fact: This is where almost all the time of the algorithm is spent - the algorithm has decent throughput, implementation complexity and time complexities, but it do suffer a bit from latency spikes, which can happen if a lot is enqueued at once)

Copy link
Contributor Author

@ysbaddaden ysbaddaden Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this is the current issue. With one million timers enqueued at once, no delete and thus no rebalance, it fills the 8KB of my thread main stack.

Thanks for the help, I'll look into it. It's much simpler!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thank you @yxhuvud 🙇

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, thank you. Only thing I'm not happy about here is that it won't be in a released version of crystal before advent of code starts, meaning I still will have to copy-paste my copy of it into my solutions :)

Copy link
Contributor Author

@ysbaddaden ysbaddaden Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where almost all the time of the algorithm is spent

I'm wondering if the auxiliary insertion buffer wouldn't help. Of course not by itself, since it would behave exactly the same (twopass + merge on delete/min), but what if the buffer has a maximum size, so it doesn't grow too big? 🤔

Simpler PointerPairingHeap: let the Timers object deal with knowing
whether we added to HEAD or actually deleted a timer, etc.

Fix potential stack-overflow when merging pairs after inserting _lots_
of timers (over 1 million fills a 8KB thread stack).
@ysbaddaden ysbaddaden force-pushed the feature/lifetime-evloop+block-arena+pairing-heap branch from 586496e to 4a63e07 Compare November 21, 2024 17:44
@ysbaddaden ysbaddaden changed the title EventLoop: store Timers in min Pairing Heap EventLoop: store Timers in min Pairing Heap [fixup #14996] Nov 22, 2024
@ysbaddaden
Copy link
Contributor Author

There's something odd happening on macOS. The specs don't seem to start or something hangs and we don't get reports on the screen 😢

Both macOS CI are sitting until the job times out.

@straight-shoota
Copy link
Member

Yes, that's a regression in master from #14651.
#15212 should fix this

@ysbaddaden ysbaddaden marked this pull request as ready for review November 22, 2024 12:09
@straight-shoota straight-shoota added this to the 1.15.0 milestone Nov 25, 2024
@straight-shoota straight-shoota merged commit 6928ca7 into crystal-lang:master Nov 26, 2024
69 checks passed
@ysbaddaden ysbaddaden deleted the feature/lifetime-evloop+block-arena+pairing-heap branch November 27, 2024 12:53
@RX14
Copy link
Contributor

RX14 commented Nov 27, 2024

Should this have been merged before the rfc?

@yxhuvud
Copy link
Contributor

yxhuvud commented Nov 27, 2024

Dunno, but there are at least some existing instances of where the pairing heap can be useful in the existing code, like for example at https://github.com/crystal-lang/crystal/blob/master/src/crystal/system/win32/event_loop_iocp.cr#L8 . (though it seems to have an inclusion check that may be new since I saw that one last. hmm)

@straight-shoota
Copy link
Member

This is an MVP implementation that's dearly necessary to fix some heavy performance issues. It's a relatively simple implementation that plugs right in without any conceptual changes.

RFC 0012 considers the next steps on how we could improve timer stores even further in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants