-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use custom exponential sleep generator #860
feat: use custom exponential sleep generator #860
Conversation
@@ -89,21 +90,21 @@ def __init__( | |||
bt_exceptions._MutateRowsIncomplete, | |||
) | |||
# build retryable operation | |||
retry = retries.AsyncRetry( | |||
retry_wrapped = functools.partial( | |||
retries.retry_target, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been looking at https://www.learnpython.org/en/Partial_functions to try to understand this a little bit better. I guess we are expecting this retries.retry_target
to be in place of the retry
function defined before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this will be a little complicated without context.
AsyncRetry is just a wrapper around retries.retry_target. But AsynRetry doesn't expose the sleep_generator argument we need to customize here. So I'm replacing the AsyncRetry instance with the lower-level retries.retry_target.
The partial allows us to prepare the function's arguments without calling it yet. In the future, we can call retry_wrapped
without any arguments, and it will call it with everything we passed into the partial here. This is useful for things like retries, where we may not always have all the context to re-build the entire operation when we want to launch a new attempt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good. Just a few nits/cleanup suggested.
@@ -62,6 +63,51 @@ def _attempt_timeout_generator( | |||
yield max(0, min(per_request_timeout, deadline - time.monotonic())) | |||
|
|||
|
|||
def _exponential_sleep_generator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the min increase is 0 and the multiplier is 1 then this would loop forever right? Maybe an error message for this case might be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a generator function, so there is no risk of an infinite loop. The method "returns" at the yield line, only to pick up there again the next time next(this_generator)
is called.
If the min increase is 0 and the multiplier is 1, it will continuously spit out the initial
value every time you call next
, but won't block
tests/unit/data/test__helpers.py
Outdated
((1, 3, 10, 0.5), [1, 1.5, 2, 2.5, 3, 3]), # test with larger multiplier | ||
((1, 25, 1.5, 5), [1, 6, 11, 16, 21, 25]), # test with larger min increase | ||
((1, 5, 1, 0), [1, 1, 1, 1]), # test with multiplier of 1 | ||
((1, 5, 1, 1), [1, 2, 3, 4]), # test with min_increase with multiplier of 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: test with min_increase 1 and multiplier of 1.
[ | ||
((), [0.01, 0.02, 0.03, 0.04, 0.05]), # test defaults | ||
((1, 3, 2, 1), [1, 2, 3, 3, 3]), # test hitting limit | ||
((1, 3, 2, 0.5), [1, 1.5, 2, 2.5, 3, 3]), # test with smaller min_increase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't the tests seem to be using the multiplier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the next_sleep value is calculated usingrandom.uniform(lower_bound, upper_bound)
. This test mocks the random function to always return the lower bound, to make testing easier (see line 101). The test below this mocks it to always return the upper bound, which is where you'll see the multiplier's effect
Holding off on this for now: There's talk of modifying the flaky test, which would remove the need for this change: googleapis/cloud-bigtable-clients-test#115 (comment) |
It will be done to resolve the flakiness (today). Later, we can explore checking the jittering more reliably (e.g. doing more retries and observe the trend, etc. It's not easy/quick for sure) |
Ok, it sounds like the jitter used by Python is an acceptable behaviour, so I will close this PR rather than changing the logic here then. Thanks! |
The conformance tests are currently flaky, because they expect sleeps between failed requests to always increase. By default, gapic libraries use a truncated backoff algorithm, where sleeps are drawn using
random.uniform(0, max_sleep)
, with max_sleep increasing exponentially. This means that latter sleeps may end up being shorter than the one beforeThis PR addresses the test failure by using a custom backoff generator, which resembles the default, but with the added property that the lower bound also updates so that sleeps increase over time
#859