-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameterize GoogleCloudStorage provider in GcsUtil to unblock gcs-co… #33368
base: master
Are you sure you want to change the base?
Conversation
GoogleCloudStorage get( | ||
GoogleCloudStorageOptions options, | ||
Storage storage, | ||
Credentials credentials, | ||
HttpRequestInitializer httpRequestInitializer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 4 params should cover both the 2.x constructor and the 3.x Builder
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
cc @Abacn - wdyt of this workaround for gcs-connector 3.x? |
Hi, thanks for the investigation. Is the builder constructor also supported on 2.x ? If so we can just change to use it in all case and no need extra options exposed to user |
Unfortunately it isn't :/ There is no way to construct a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understand, thanks!
thanks @Abacn ! should I update CHANGES.md for the |
java precommit tineout for several rerun. It's passing on HEAD, could you please taking a look if it is related to this change? |
after multiple rerun there is indeed a related test failure: https://github.com/apache/beam/runs/34888346578
looks like it is coded in test to prevent the exposure of unwanted classes. Refactor in a way that does not leak these may fix |
thanks for looking into it @Abacn ! hmm, this seems challenging to refactor since the exposure is coming from |
Rationale:
I would like to use gcs-connector 3.x, which supports the new Parquet VectorIO feature. However, gcs-connector 3.x also drops Java 8 and targets Java 11, which blocks us from upgrading it directly in Beam, since Beam is still targeting 8 (see #31678).
Additionally, as a Beam user, I can't just upgrade gcs-connector on my end, due to breaking changes in how
GoogleCloudStorageImpl
is instantiated: in 2.x it has public constructors, but in 3.x it drops the public constructors and enforces a Builder pattern.Therefore, when running on gcs-connector 3.x, my pipeline throws a NoSuchMethodError from
org.apache.beam.sdk.extensions.gcp.util.GcsUtil
when it tries to invoke the 2.x constructor: https://github.com/apache/beam/blob/v2.61.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java#L727This PR adds a pipeline option for a GoogleCloudStorage Provider, so that users who want to use gcs-connector 3.x can be unblocked from doing so. It defaults to invoking the gcs-connector 2.x public constructor, but 3.x users can override it to use the Builder.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.