feat: use many tasks to order streams and discover undelivered events at startup #620

dav1do · 2024-11-27T03:19:52Z

After an IPFS migration, we have to review all the data in the database to make sure we have the complete stream history before we can send events out of the API. We originally kept it simple and would read events and process them, then repeat in a single task. This was taking far too long on large datasets (e.g. 100s of GBs). Now we spawn multiple tasks to read the events from the database, and they send events over a channel to the ordering task (like we do during normal operation). This task was also modified to spawn multiple tasks to process events by stream and order them. Both changes appeared necessary during testing as one side would waiting on the other. This allows us to keep a solid rate of processing going and we've seen a substantial improvement in runtime (~60-100x faster).

On the discovery side: At startup, we spawn 16 tasks to read batches from the database. The number of events read each time was reduced to 250, as 1000 was taking seconds. The values are slightly arbitrary but this seemed like a "fast enough" choice during testing (the goal is simply to keep the channel full). The event data is partitioned using (rowid % number_tasks) = task_number so we don't have to do anything clever to split the data into batches up front. Each task starts from the beginning and pick up any events that have been missed. Once it finishes, the subsequent runs are fast, so we spawn the tasks regardless of whether they're needed.

On the ordering side a few changes were made. First, the channel size was reduced to 10000 (the previous value was far too large) and we try to empty it before doing any ordering since we have more tasks to process the set, and any events found may avoid database reads if they're for the same stream. Once events are grouped by stream, we split the streams into batches and spawn 1-16 tasks to process each batch. This processing has cpu bound work, but also requires database reads so multiple tasks have been beneficial. The tasks then send their ordered data back to the manager, which handles writing to the database. During testing, I made a change to remove a RO connection from the pool (and allow it to grow afterward) for each of these tasks. It didn't seem to make an obvious difference, but it may be useful to revisit.

… start up This will help some as we're able to do all the sorting/reading of event history in one task while the other finds new events that need to be added. It is similar to the insert/ordering task flow now.

we can process each stream individually, so we spawn tasks to handle batches of streams so we can do db reads in parallel.

event-svc/src/event/ordering_task.rs

dav1do temporarily deployed to dev-qa-2024 November 27, 2024 04:59 — with GitHub Actions Inactive

dav1do force-pushed the feat/parallelize-undelivered branch from a8bef06 to aa30168 Compare November 27, 2024 05:03

dav1do temporarily deployed to dev-qa-2024 November 27, 2024 05:22 — with GitHub Actions Inactive

dav1do marked this pull request as ready for review November 27, 2024 18:06

dav1do requested review from nathanielc and a team as code owners November 27, 2024 18:06

dav1do requested review from sam701 and removed request for a team November 27, 2024 18:06

Base automatically changed from chore/undelivered-logs to main November 27, 2024 18:42

dav1do force-pushed the feat/parallelize-undelivered branch from aa30168 to 7031f11 Compare November 27, 2024 22:19

dav1do requested a review from stbrody as a code owner November 27, 2024 22:19

dav1do changed the base branch from main to chore/db-optimize November 27, 2024 22:19

dav1do force-pushed the feat/parallelize-undelivered branch from 7031f11 to d6b152e Compare November 27, 2024 22:20

dav1do temporarily deployed to dev-qa-2024 November 27, 2024 22:39 — with GitHub Actions Inactive

Base automatically changed from chore/db-optimize to main November 28, 2024 06:03

dav1do force-pushed the feat/parallelize-undelivered branch 2 times, most recently from 5f1f615 to 83308f9 Compare December 2, 2024 22:02

dav1do temporarily deployed to dev-qa-2024 December 2, 2024 22:21 — with GitHub Actions Inactive

dav1do temporarily deployed to dev-qa-2024 December 2, 2024 22:57 — with GitHub Actions Inactive

dav1do marked this pull request as draft December 2, 2024 23:25

dav1do temporarily deployed to dev-qa-2024 December 2, 2024 23:43 — with GitHub Actions Inactive

dav1do force-pushed the feat/parallelize-undelivered branch from ee59405 to df8e38f Compare December 3, 2024 00:07

dav1do had a problem deploying to dev-qa-2024 December 3, 2024 00:26 — with GitHub Actions Failure

dav1do force-pushed the feat/parallelize-undelivered branch from df8e38f to 1d03275 Compare December 3, 2024 01:38

dav1do temporarily deployed to dev-qa-2024 December 3, 2024 01:56 — with GitHub Actions Inactive

dav1do force-pushed the feat/parallelize-undelivered branch from 1d03275 to c52992f Compare December 3, 2024 02:35

dav1do changed the base branch from main to fix/sqlite-config December 3, 2024 02:41

dav1do temporarily deployed to dev-qa-2024 December 3, 2024 02:54 — with GitHub Actions Inactive

dav1do force-pushed the feat/parallelize-undelivered branch from c52992f to 8601811 Compare December 3, 2024 03:24

dav1do force-pushed the fix/sqlite-config branch from 404974b to dfdac38 Compare December 3, 2024 03:40

dav1do force-pushed the feat/parallelize-undelivered branch from 8601811 to 7d16585 Compare December 3, 2024 03:41

dav1do temporarily deployed to dev-qa-2024 December 3, 2024 03:59 — with GitHub Actions Inactive

dav1do force-pushed the feat/parallelize-undelivered branch from 7d16585 to f389ea2 Compare December 4, 2024 16:20

dav1do temporarily deployed to dev-qa-2024 December 4, 2024 16:40 — with GitHub Actions Inactive

Base automatically changed from fix/sqlite-config to main December 5, 2024 17:08

dav1do added 5 commits December 5, 2024 10:26

feat: use a reader and a writer task to process undelivered events at…

7065839

… start up This will help some as we're able to do all the sorting/reading of event history in one task while the other finds new events that need to be added. It is similar to the insert/ordering task flow now.

chore: add more undelivered startup tests

378ea33

chore: clippy

68c85c8

feat: use multiple tasks to read events during ordering

b21f7dc

we can process each stream individually, so we spawn tasks to handle batches of streams so we can do db reads in parallel.

feat: use multiple tasks order events for streams

cd39825

dav1do force-pushed the feat/parallelize-undelivered branch from f389ea2 to cd39825 Compare December 5, 2024 19:21

dav1do changed the title ~~feat: use two tasks to process undelivered events at startup~~ feat: use many tasks to order streams and discover undelivered events at startup Dec 5, 2024

dav1do temporarily deployed to dev-qa-2024 December 5, 2024 19:40 — with GitHub Actions Inactive

dav1do commented Dec 5, 2024

View reviewed changes

event-svc/src/event/ordering_task.rs Show resolved Hide resolved

event-svc/src/event/ordering_task.rs Show resolved Hide resolved

event-svc/src/event/ordering_task.rs Show resolved Hide resolved

dav1do marked this pull request as ready for review December 5, 2024 20:03

nathanielc approved these changes Dec 5, 2024

View reviewed changes

event-svc/src/event/ordering_task.rs Show resolved Hide resolved

dav1do added 4 commits December 5, 2024 13:38

chore: fix nonsense doc comment

accce0b

chore: reduce ordering task message to debug on startup

d1bf371

chore: one more info -> debug

de89560

chore: fmt

d3c0bae

dav1do had a problem deploying to dev-qa-2024 December 5, 2024 21:20 — with GitHub Actions Failure

dav1do temporarily deployed to dev-qa-2024 December 5, 2024 21:34 — with GitHub Actions Inactive

dav1do added this pull request to the merge queue Dec 5, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 5, 2024

dav1do added this pull request to the merge queue Dec 5, 2024

Merged via the queue into main with commit c959cc3 Dec 5, 2024
5 checks passed

dav1do deleted the feat/parallelize-undelivered branch December 5, 2024 23:03

smrz2001 mentioned this pull request Dec 9, 2024

chore: version v0.46.0 #630

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use many tasks to order streams and discover undelivered events at startup #620

feat: use many tasks to order streams and discover undelivered events at startup #620

dav1do commented Nov 27, 2024 •

edited

Loading

feat: use many tasks to order streams and discover undelivered events at startup #620

feat: use many tasks to order streams and discover undelivered events at startup #620

Conversation

dav1do commented Nov 27, 2024 • edited Loading

dav1do commented Nov 27, 2024 •

edited

Loading