Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: retry delay strategy #871

Merged
merged 13 commits into from
May 1, 2024
Merged

Conversation

krpeacock
Copy link
Contributor

@krpeacock krpeacock commented Apr 2, 2024

Description

Developers have noticed the agent is more frequently erroring with the new watermark protections against replay attacks / stale data. This feature adds a delay strategy for retries that will allow for more time for nodes to catch up, with exponential increases to the rate

Fixes SDK-1562

How Has This Been Tested?

new e2e tests

Checklist:

  • My changes follow the guidelines in CONTRIBUTING.md.
  • The title of this PR complies with Conventional Commits.
  • I have edited the CHANGELOG accordingly.
  • I have made corresponding changes to the documentation.

@krpeacock krpeacock requested a review from a team as a code owner April 2, 2024 18:39
Copy link
Contributor

github-actions bot commented Apr 2, 2024

size-limit report 📦

Path Size
@dfinity/agent 85.62 KB (+0.93% 🔺)
@dfinity/candid 13.58 KB (0%)
@dfinity/principal 4.97 KB (0%)
@dfinity/auth-client 60.71 KB (+1.4% 🔺)
@dfinity/assets 80.24 KB (+0.88% 🔺)
@dfinity/identity 57.92 KB (+1.34% 🔺)
@dfinity/identity-secp256k1 265.65 KB (+0.2% 🔺)

packages/agent/src/agent/http/index.ts Outdated Show resolved Hide resolved
packages/agent/src/polling/strategy.ts Outdated Show resolved Hide resolved
Copy link

@ByronBecker ByronBecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exponential backoff with jitter should work fine for most cases, but assuming no jitter in the case of 3 retries that's a call at 0, 150ms, 600ms, and 1200ms (am I correct here?)

It would be nice to see if this covers every case, or if there would still be a super small chance that this error might get hit (even with the exponential backoff).

Timestamp failed to pass the watermark after retrying the configured 3 times. We cannot guarantee the integrity of the response since it could be a replay attack.

If this error got hit with the exponential backoff in place, what would that mean for the FE app? Is the subnet down or behind? Did we just get super unlucky? Does the same error message make sense now that the exponential backoff is in place (i.e. does this error have a different meaning)?

Ideally, this error would never happen by accident, or even if the FE client user has a poor internet connection. Then if they receive the error it would actually suggest something is not healthy with the nodes/subnet they've communicated with.

Putting a higher threshold in by using exponential backoff and hoping that this is a rare case is just going to confuse developers, and especially confuse end users in the even this error is hit and a toast component pops up with "Timestamp failed to pass the watermark after retrying the configured 3 times. We cannot guarantee the integrity of the response since it could be a replay attack" (in fact, this sort of sounds like security issue).

Copy link
Contributor

@dfx-json dfx-json left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@krpeacock krpeacock dismissed ericswanson-dfinity’s stale review May 1, 2024 21:04

I adopted Eric's suggestions

@krpeacock krpeacock merged commit a38f3d5 into main May 1, 2024
16 checks passed
@krpeacock krpeacock deleted the kai/SDK-1562-watermark-retry-delay branch May 1, 2024 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants