Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various fixes in fetchAuthEvents #3447

Merged
merged 4 commits into from
Dec 17, 2024
Merged

Various fixes in fetchAuthEvents #3447

merged 4 commits into from
Dec 17, 2024

Conversation

S7evinK
Copy link
Contributor

@S7evinK S7evinK commented Dec 1, 2024

This might fix issues with state events gone missing.

@CLAassistant
Copy link

CLAassistant commented Dec 1, 2024

CLA assistant check
All committers have signed the CLA.

@codecov-commenter
Copy link

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@S7evinK
Copy link
Contributor Author

S7evinK commented Dec 11, 2024

Note that I'm happy with adding the failing Sytests to the blocklist, as they test implementation specific behavior.

@S7evinK S7evinK marked this pull request as ready for review December 11, 2024 17:50
@S7evinK S7evinK requested a review from a team as a code owner December 11, 2024 17:50
Copy link
Contributor

@kegsay kegsay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No justification for the change, nor any tests.

What's the context here?

@neilalexander
Copy link
Contributor

neilalexander commented Dec 12, 2024

@kegsay These changes were cherry-picked in from Harmony. Overview:

  1. fetchAuthEvents has a causality/ordering issue and is broken today because it doesn't clear the auth set when it fetches a new set of events via federation. The earlier fetched events get incorrectly authed against the few later events that we did know about (if any), which is why so many auth events get unexpectedly rejected with PLs looking like they are trying to reset to an earlier state etc. That bug is now fixed.
  2. We need to correctly track the rejected state of auth events through the processing of the event so that fetchAuthEvents can determine whether we need to try to re-fetch those over federation and/or retry them, giving them a "second chance" to be allowed if they were initially rejected for the wrong reason (i.e. due to the above bug).

I didn't write tests and I probably won't at this point because I am happy with the logic of the changes. I've extensively tested them on my own instance and have seen a dramatic reduction in unexpected state resets and/or state going missing, as well as a near-total reduction of unexpected auth errors that aren't signature-related.

This is one of the things we've been missing for literally years now on why Dendrite was unexpectedly unreliable in some rooms with no obvious explanation. It's a huge enough fix to be probably even worthy of a release.

@S7evinK S7evinK merged commit e8b1a89 into main Dec 17, 2024
19 checks passed
@S7evinK S7evinK deleted the s7evink/fetch-auth-events branch December 17, 2024 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants