-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use direct executor to deflake tests #33187
Conversation
b47e630
to
6dc5803
Compare
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
6dc5803
to
782fc35
Compare
782fc35
to
48a048e
Compare
Assigning reviewers. If you would like to opt out of this review, comment R: @Abacn added as fallback since no labels match configuration Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #33187 +/- ##
=========================================
Coverage 58.93% 58.93%
Complexity 3112 3112
=========================================
Files 1133 1133
Lines 174989 174989
Branches 3343 3343
=========================================
Hits 103136 103136
Misses 68508 68508
Partials 3345 3345 ☔ View full report in Codecov by Sentry. |
R: @Abacn |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. I think we need to understand the test failure was purely testing issue or could also happen in production.
getDataMetricTracker); | ||
getDataMetricTracker, | ||
// Run the workerMetadataConsumer on the direct calling thread to make testing more | ||
// deterministic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"to make testing more deterministic" gives an impression that the change just fix tests, however the test code path then diverts from the real one.
Please provide more information in this comment why the race observed in the test does not affect production, for future reference.
If this indeed could happen in production then we should fix the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In prod, this is to hand off the task from a thread that (may) perform network IO and we do not want the task to block that since it acquires a lock to do its work. Not needed in testing and can logically be called in line
Added comment.
@@ -85,7 +86,9 @@ static ChannelCache forTesting( | |||
notification -> { | |||
shutdownChannel(notification.getValue()); | |||
onChannelShutdown.run(); | |||
}); | |||
}, | |||
// Run the removal on the calling thread for better determinism in tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added, this doesn't change any behavior we just want the removal to run synchronously so we don't have to rely on waiting in tests
We've seen similar scenario for different tests. This is due to CI/CD is often busier, has heavier CPU / thread pressure, which arguably more resemble to production workers |
This has only shown up in these test suites (haven't run into in load testing). I wonder if its due to the threads waiting to be scheduled, but the resources are consumed while executing other tests. |
done, back to you @Abacn thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
MoreExecutors.directExecutor()/directExecutorService
runs all tasks on the calling thread (w/o offloading to another thread for async work) and calls tosubmit
andexecute
will block until the submitted task returns (i.eRunnable.run()
).Use this in test implementations of
ChannelCache
andFanOutStreamingEngineWorkerHarness
to prevent threads waiting on each other. The old implementation seems to work locally but in the test runner environment has increased in flakiness.Flakiness is referenced in #28957
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.