-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestNullFeed is flaky #268
Comments
Not entirely sure what the purpose of this test is. It could use a bit more documentation as to what it's testing and how it's testing it. |
Okay, so I think the problem is actually that the messages are not being fully deleted from mainbot before it attempts to re-replicate from bertbot, which then throws off the validation because there is still a previous message and at that point it has detected a forked feed, which is AFAICT what this is creating. It's actually a pretty clever test if you step through it. So the failure does appear to be catching a legitimate problem - either |
Hmmm...another possibility: it's replicating the messages out of order. So far, I don't see how it could be any of my previous hypotheses. |
Interesting...I added a few more checks, and this one failed in one of the runs (which I terminated because I thought I had a bug in my changes, but subsequent re-runs did not have this FAIL):
Oddly enough, this is a different failure. This would be at the arny check here: About the only way I could think of for that to happen is if the act of publishing is not synchronous. Or the disk write isn't, which is entirely possible since I think. Still getting acquainted with the codebase. |
Huh. Yet another different error from this test: https://github.com/ssbc/go-ssb/actions/runs/3744726663/jobs/6358410184#step:9:406 |
Whew, that last one's a doozy. If I read that right, since that failed, it means that indexing was still happening after the messages were nulled from the main log. Without any kind of index locking or any way to know if the indexes are up-to-date (see #251), that means that the indexes now need to be able to deal with messages being fed to them which have already been deleted. |
So...it made it through checking that both logs were in a stable state where they should be before replication started. So they both had the correct sequence numbers at that point. So this pretty much had to have happened during the replication step. |
Now here's an interesting one:
The interesting thing is that it crashed here: ...which means |
Here is the last debug message before it failed: Which means this one happened during replication, and several rounds of FSCK passed successfully. |
I think I might have to put this one down for a while. There are way too many (very different) failure cases in this one. I think fixing some of the other tests that test smaller bits of functionality might make tracking this one down easier. This one tests a lot of stuff in one test. |
Looks like another race condition:
The text was updated successfully, but these errors were encountered: