-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use many tasks to order streams and discover undelivered events at startup #620
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dav1do
force-pushed
the
feat/parallelize-undelivered
branch
from
November 27, 2024 05:03
a8bef06
to
aa30168
Compare
dav1do
force-pushed
the
feat/parallelize-undelivered
branch
from
November 27, 2024 22:19
aa30168
to
7031f11
Compare
dav1do
force-pushed
the
feat/parallelize-undelivered
branch
from
November 27, 2024 22:20
7031f11
to
d6b152e
Compare
dav1do
force-pushed
the
feat/parallelize-undelivered
branch
2 times, most recently
from
December 2, 2024 22:02
5f1f615
to
83308f9
Compare
dav1do
force-pushed
the
feat/parallelize-undelivered
branch
from
December 3, 2024 00:07
ee59405
to
df8e38f
Compare
dav1do
force-pushed
the
feat/parallelize-undelivered
branch
from
December 3, 2024 01:38
df8e38f
to
1d03275
Compare
dav1do
force-pushed
the
feat/parallelize-undelivered
branch
from
December 3, 2024 02:35
1d03275
to
c52992f
Compare
dav1do
force-pushed
the
feat/parallelize-undelivered
branch
from
December 3, 2024 03:24
c52992f
to
8601811
Compare
dav1do
force-pushed
the
fix/sqlite-config
branch
from
December 3, 2024 03:40
404974b
to
dfdac38
Compare
dav1do
force-pushed
the
feat/parallelize-undelivered
branch
from
December 3, 2024 03:41
8601811
to
7d16585
Compare
dav1do
force-pushed
the
feat/parallelize-undelivered
branch
from
December 4, 2024 16:20
7d16585
to
f389ea2
Compare
… start up This will help some as we're able to do all the sorting/reading of event history in one task while the other finds new events that need to be added. It is similar to the insert/ordering task flow now.
we can process each stream individually, so we spawn tasks to handle batches of streams so we can do db reads in parallel.
dav1do
force-pushed
the
feat/parallelize-undelivered
branch
from
December 5, 2024 19:21
f389ea2
to
cd39825
Compare
dav1do
changed the title
feat: use two tasks to process undelivered events at startup
feat: use many tasks to order streams and discover undelivered events at startup
Dec 5, 2024
dav1do
commented
Dec 5, 2024
nathanielc
approved these changes
Dec 5, 2024
github-merge-queue
bot
removed this pull request from the merge queue due to failed status checks
Dec 5, 2024
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After an IPFS migration, we have to review all the data in the database to make sure we have the complete stream history before we can send events out of the API. We originally kept it simple and would read events and process them, then repeat in a single task. This was taking far too long on large datasets (e.g. 100s of GBs). Now we spawn multiple tasks to read the events from the database, and they send events over a channel to the ordering task (like we do during normal operation). This task was also modified to spawn multiple tasks to process events by stream and order them. Both changes appeared necessary during testing as one side would waiting on the other. This allows us to keep a solid rate of processing going and we've seen a substantial improvement in runtime (~60-100x faster).
On the discovery side: At startup, we spawn 16 tasks to read batches from the database. The number of events read each time was reduced to 250, as 1000 was taking seconds. The values are slightly arbitrary but this seemed like a "fast enough" choice during testing (the goal is simply to keep the channel full). The event data is partitioned using
(rowid % number_tasks) = task_number
so we don't have to do anything clever to split the data into batches up front. Each task starts from the beginning and pick up any events that have been missed. Once it finishes, the subsequent runs are fast, so we spawn the tasks regardless of whether they're needed.On the ordering side a few changes were made. First, the channel size was reduced to 10000 (the previous value was far too large) and we try to empty it before doing any ordering since we have more tasks to process the set, and any events found may avoid database reads if they're for the same stream. Once events are grouped by stream, we split the streams into batches and spawn 1-16 tasks to process each batch. This processing has cpu bound work, but also requires database reads so multiple tasks have been beneficial. The tasks then send their ordered data back to the manager, which handles writing to the database. During testing, I made a change to remove a RO connection from the pool (and allow it to grow afterward) for each of these tasks. It didn't seem to make an obvious difference, but it may be useful to revisit.