atproto_firehose subscriber falls behind when atproto-hub is busy #1641

snarfed · 2024-12-20T17:21:16Z

Started seeing an ugly failure mode in atproto-hub yesterday: our firehose consumer slowed down below the Bluesky relay's event rate, so we started falling behind. atproto-hub is CPU bound, and we had many (8-10) other firehose clients consuming our firehose at the same time, which is high. I dropped ROLLBACK_WNDOW from 200k seqs to 50k, and added a second core to atproto-hub, which seemed to help, but we were still falling behind occasionally for a bit and then catching back up. odd.

...and then this morning it got worse. I added snarfed/lexrpc@22d9fee to shed load by denying additional connections from the same IP after the first, which helped, so we're now out of the woods:

I still don't fully understand the failure mode though.

The text was updated successfully, but these errors were encountered:

snarfed added the infra label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

atproto_firehose subscriber falls behind when atproto-hub is busy #1641

atproto_firehose subscriber falls behind when atproto-hub is busy #1641

snarfed commented Dec 20, 2024 •

edited

Loading

atproto_firehose subscriber falls behind when atproto-hub is busy #1641

atproto_firehose subscriber falls behind when atproto-hub is busy #1641

Comments

snarfed commented Dec 20, 2024 • edited Loading

snarfed commented Dec 20, 2024 •

edited

Loading