-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add various utility meta-transforms to Beam. #32445
Conversation
Assigning reviewers. If you would like to opt out of this review, comment R: @liferoad for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Shall we add this to https://beam.apache.org/documentation/programming-guide/#flatten as one of core PTransform? |
Added a note about the Flatten alternative. I don't think |
0be36ca
to
58e0229
Compare
These look really convenient. Especially for pipelines that might write out intermediate results or merge older pcollections (something the schrodinger use cases do a lot).
I personally discover useful transforms through the beam transform catalog so it'd be nice if some examples were included there. |
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @tvalentyn for label python. Available commands:
|
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Flatten.java
Outdated
Show resolved
Hide resolved
|
||
@Override | ||
public String getKindString() { | ||
return "Flatten.With"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge.With might be a possible alternative name. but maybe it adds more confusion since we have a pre-existing Flatten
already for a similar concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I chose this name because it is literally syntactic sugar for the same primitive Flatten operation. (Personally, I'd prefer disjoint union, but that's probably to obscure let alone too late to change now...)
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Flatten.java
Outdated
Show resolved
Hide resolved
pcoll1 = partitioned[0] | ||
pcoll2 = partitioned[1] | ||
pcoll3 = partitioned[2] | ||
SomeTransform = lambda: beam.Map(lambda x: x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have other example for merging with a transform output? feel free to create a bug to add it. examples are just important as having the capability, so i think we should highlight these everywhere (beam playground, snippets, website docs, etc). Can be with follow up /starter bugs if you don't have time to do all that in one change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this example, filed #32840 for follow-up. It would be good to think about how we could structure things to further reduce redundancy between these various forms of documentation.
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Tee.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look. I added one more example, but filed an issue for further documentation in the interest of not blocking things.
|
||
@Override | ||
public String getKindString() { | ||
return "Flatten.With"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I chose this name because it is literally syntactic sugar for the same primitive Flatten operation. (Personally, I'd prefer disjoint union, but that's probably to obscure let alone too late to change now...)
pcoll1 = partitioned[0] | ||
pcoll2 = partitioned[1] | ||
pcoll3 = partitioned[2] | ||
SomeTransform = lambda: beam.Map(lambda x: x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this example, filed #32840 for follow-up. It would be good to think about how we could structure things to further reduce redundancy between these various forms of documentation.
@robertwb do we need an issue or commit for this as well? PTAL at website failures & please look at stage website content before merge to see if the changes reflect your intent. Link should be available in the Summary tab of the Stage_GCS GithubAction run. LGTM otherwise, thanks! |
Thanks. Looking into the website failures... |
Update the description to mention both. |
These were inspired by some discussions at the Beam summit.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.