Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: synchronize value for synchronized ranges #238

Merged

Conversation

nathanielc
Copy link
Collaborator

@nathanielc nathanielc commented Jan 23, 2024

This change updates the Recon protocol to check for any missing values when a synchronized range of keys is discovered. This way as nodes are synchronized they ensure they also have all values for their known keys.

TODO

  • The recon split changes need to merge first

Copy link

linear bot commented Jan 23, 2024

@nathanielc nathanielc requested a review from dav1do January 23, 2024 20:24
cat: [a: ∅, b: ∅, c: ∅]
-> value_req(b)
cat: [a: ∅, b: ∅, c: ∅]
<- value_resp(a: A)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be deterministic? cat will send value requests for a and b, and dog will send a value response for a before cat sends the value request for c?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along those lines.. in #243, I'm seeing some ordering changes to req/resp values cause failures and was hoping to get some help understanding what should happen and if it's a problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I regenerated the expect tests for my PR and they seem to be consistent in the that order for now. Will cause some more conflicts for you merging back in.. hopefully it's not bad. It should be merged shortly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So its not perfectly deterministic but it does seem to be stable for a given implementation. Meaning I do not expect the tests to be flaky but changes in the performance of the code might change the order.

The test driver explicitly uses a single thread to drive the message exchange, but its still possible because of the async nature of the code that messages can be delivered in a different order.

If we discover that the tests are somehow flaky we can explore ways to improve them. For example we could remove the assertions about state from each message and then allow for messages to be out of order using a new syntax. However I'd prefer to avoid that complexity if possible.

TL;DR we might have to update the expectations when we change to code but the tests should be stable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great, thanks.

Copy link
Contributor

@dav1do dav1do left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. Has conflicts to resolve in order to run the checks. My open PR #243 will conflict but it shouldn't be too bad for me to resolve them if this goes in first.

&mut self,
range: RangeOpen<Self::Key>,
) -> Result<Vec<Self::Key>> {
let query = sqlx::query(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we need to check if the range is valid? In hash_range, we check if left >= right and return 0 but in other range related queries we don't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you updated this (or I just missed it before). I'm still curious if this is a concern of storage trait or it's going to be verified higher at the recon client/server level, but nothing to do.

cat: [a: ∅, b: ∅, c: ∅]
-> value_req(b)
cat: [a: ∅, b: ∅, c: ∅]
<- value_resp(a: A)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along those lines.. in #243, I'm seeing some ordering changes to req/resp values cause failures and was hoping to get some help understanding what should happen and if it's a problem.

@nathanielc nathanielc force-pushed the feature/ws1-1427-events-synchronize-correctness-tests branch from ce2c769 to 3b58c64 Compare January 24, 2024 21:29
Base automatically changed from feat/ws1-1432-recon-table to main January 24, 2024 22:57
@nathanielc nathanielc force-pushed the feature/ws1-1427-events-synchronize-correctness-tests branch from b4da4db to 83b8f75 Compare January 25, 2024 15:23
@nathanielc nathanielc requested review from smrz2001 and dav1do January 25, 2024 15:24
Copy link
Contributor

@dav1do dav1do left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A couple comments/questions, but I don't have any concerns that need to be addressed before merging.

@@ -91,45 +97,64 @@ where
// we insert the value first as it's possible we already have the key and can skip that step
// as it happens in a transaction, we'll roll back the value insert if the key insert fails and try again
if let Some(val) = item.value {
if self.insert_value_int(item.key, val, conn).await? {
// Update the value_retrieved flag, and report if the key already exists.
let key_exists = self.update_value_retrieved_int(item.key, conn).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to do this after inserting the value? it seems like we might have an issue if we try to set retrieved, but the key doesn't exist, and then we add the key with retrieved=false.

we could potentially modify insert_value_int to do this update as well and pass a bool to insert_key_int for value_retrieved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doh, that's exactly what you did lol. nice! probably worth combining the update/insert in a future PR as we optimize some of these queries, but nothing to do now.

Copy link
Collaborator

@smrz2001 smrz2001 Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was just going to suggest combining the update/insert then saw your comment on refreshing the page, @dav1do 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! I didn't see yours when I hit resolve! I re-opened it, but I'm okay not addressing it now.

&mut self,
range: RangeOpen<Self::Key>,
) -> Result<Vec<Self::Key>> {
let query = sqlx::query(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you updated this (or I just missed it before). I'm still curious if this is a concern of storage trait or it's going to be verified higher at the recon client/server level, but nothing to do.

recon/src/recon.rs Show resolved Hide resolved
core/src/event_id.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@smrz2001 smrz2001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just one question about determinism in the tests.

This change updates the Recon protocol to check for any missing values
when a synchronized range of keys is discovered. This way as nodes are
synchronized they ensure they also have all values for their known keys.
@nathanielc nathanielc force-pushed the feature/ws1-1427-events-synchronize-correctness-tests branch from 067ea62 to 2ada06c Compare January 25, 2024 18:29
@nathanielc nathanielc enabled auto-merge January 25, 2024 18:29
@nathanielc nathanielc added this pull request to the merge queue Jan 25, 2024
Merged via the queue into main with commit 393b876 Jan 25, 2024
4 checks passed
@nathanielc nathanielc deleted the feature/ws1-1427-events-synchronize-correctness-tests branch January 25, 2024 18:51
@github-actions github-actions bot mentioned this pull request Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants