Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use custom exponential sleep generator #860

Closed

Conversation

daniel-sanche
Copy link
Contributor

The conformance tests are currently flaky, because they expect sleeps between failed requests to always increase. By default, gapic libraries use a truncated backoff algorithm, where sleeps are drawn using random.uniform(0, max_sleep), with max_sleep increasing exponentially. This means that latter sleeps may end up being shorter than the one before

This PR addresses the test failure by using a custom backoff generator, which resembles the default, but with the added property that the lower bound also updates so that sleeps increase over time

#859

@daniel-sanche daniel-sanche requested review from a team as code owners August 18, 2023 21:52
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. api: bigtable Issues related to the googleapis/python-bigtable API. labels Aug 18, 2023
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: xl Pull request size is extra large. labels Aug 18, 2023
@daniel-sanche daniel-sanche changed the base branch from v3 to experimental_v3 August 30, 2023 20:39
@@ -89,21 +90,21 @@ def __init__(
bt_exceptions._MutateRowsIncomplete,
)
# build retryable operation
retry = retries.AsyncRetry(
retry_wrapped = functools.partial(
retries.retry_target,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been looking at https://www.learnpython.org/en/Partial_functions to try to understand this a little bit better. I guess we are expecting this retries.retry_target to be in place of the retry function defined before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will be a little complicated without context.

AsyncRetry is just a wrapper around retries.retry_target. But AsynRetry doesn't expose the sleep_generator argument we need to customize here. So I'm replacing the AsyncRetry instance with the lower-level retries.retry_target.

The partial allows us to prepare the function's arguments without calling it yet. In the future, we can call retry_wrapped without any arguments, and it will call it with everything we passed into the partial here. This is useful for things like retries, where we may not always have all the context to re-build the entire operation when we want to launch a new attempt

Copy link

@danieljbruce danieljbruce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good. Just a few nits/cleanup suggested.

@@ -62,6 +63,51 @@ def _attempt_timeout_generator(
yield max(0, min(per_request_timeout, deadline - time.monotonic()))


def _exponential_sleep_generator(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the min increase is 0 and the multiplier is 1 then this would loop forever right? Maybe an error message for this case might be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a generator function, so there is no risk of an infinite loop. The method "returns" at the yield line, only to pick up there again the next time next(this_generator) is called.

If the min increase is 0 and the multiplier is 1, it will continuously spit out the initial value every time you call next, but won't block

((1, 3, 10, 0.5), [1, 1.5, 2, 2.5, 3, 3]), # test with larger multiplier
((1, 25, 1.5, 5), [1, 6, 11, 16, 21, 25]), # test with larger min increase
((1, 5, 1, 0), [1, 1, 1, 1]), # test with multiplier of 1
((1, 5, 1, 1), [1, 2, 3, 4]), # test with min_increase with multiplier of 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: test with min_increase 1 and multiplier of 1.

[
((), [0.01, 0.02, 0.03, 0.04, 0.05]), # test defaults
((1, 3, 2, 1), [1, 2, 3, 3, 3]), # test hitting limit
((1, 3, 2, 0.5), [1, 1.5, 2, 2.5, 3, 3]), # test with smaller min_increase

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't the tests seem to be using the multiplier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the next_sleep value is calculated usingrandom.uniform(lower_bound, upper_bound). This test mocks the random function to always return the lower bound, to make testing easier (see line 101). The test below this mocks it to always return the upper bound, which is where you'll see the multiplier's effect

@daniel-sanche
Copy link
Contributor Author

Holding off on this for now: There's talk of modifying the flaky test, which would remove the need for this change: googleapis/cloud-bigtable-clients-test#115 (comment)

@liujiongxin
Copy link

Holding off on this for now: There's talk of modifying the flaky test, which would remove the need for this change: googleapis/cloud-bigtable-clients-test#115 (comment)

It will be done to resolve the flakiness (today). Later, we can explore checking the jittering more reliably (e.g. doing more retries and observe the trend, etc. It's not easy/quick for sure)

@daniel-sanche
Copy link
Contributor Author

It will be done to resolve the flakiness (today). Later, we can explore checking the jittering more reliably (e.g. doing more retries and observe the trend, etc. It's not easy/quick for sure)

Ok, it sounds like the jitter used by Python is an acceptable behaviour, so I will close this PR rather than changing the logic here then. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the googleapis/python-bigtable API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants