-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roll forward "Read API Source v2 (#25392)" fix data loss #28778
Conversation
The only conflict is on the removal of Experimental annotation on master, giving this diff (+105 −2,863) different from original PR (+2,867 −105) |
R: @ahmedabu98 |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
IIRC the @vachan-shetty is this mode no longer needed? |
Added For TPC-DS 1T dataset, it gives "Read session returned 1954 streams", and all records are read. See the internal gcp project @vachan-shetty PTAL |
@Abacn @vachan-shetty what are next steps here? |
👍🏽 |
R: @ahmedabu98 rebased onto latest master to resolve merge conflict. PTAL |
Can we update CHANGES.md to explain this breaking changes? |
I thought it is fine to remove it without mention it as last time the addition of |
"If set, BigQueryIO.Read will rely on the Read API backends to surface the appropriate" | ||
+ " number of streams for read") | ||
@Default.Boolean(false) | ||
Boolean getEnableBundling(); | ||
Boolean getEnableStorageReadApiV2(); | ||
|
||
void setEnableBundling(Boolean value); | ||
void setEnableStorageReadApiV2(Boolean value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this option be removed entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The effect of this option is to set the number of stream requested to 0 so the server will decide an appropriate number of read streams (which is Read API source v2). It is preserved for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, makes sense
SpannerChangeStreamOrderedWithinKeyIT.testOrderedWithinKey unrelated flaky test, merging for now |
Fixes #26354
This reverts commit 4ce8eed.
setEnableBundling is defunct since added to Beam repo and opt-in it will result in data loss, see #26354. After #26267 the number of sources returned by split is no longer an issue and BigQueryStorageStreamBundleSource is not needed after all.
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.