-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove special PyMap, PyFlatMap, etc. #28546
Conversation
* Avoid the use of MetaProviders, which was always kind of hacky. We may want to remove this infrastructure altogether as it does not play nicely with provider inference. * Split MapToFields into separate mapping, filtering, and exploding operations. * Allow MapToFields to act on non-schema'd PCollections. The various langauge flavors of these UDFs are now handled by a preprocessing step. This will make it easier to extend to other langauges, including in particular possible multiple (equivalent) implementations of javascript to minimize cross-langauge boundary crossings.
Prefer to use generic MapToFields instead.
This depends on #28462 |
Codecov Report
@@ Coverage Diff @@
## master #28546 +/- ##
==========================================
- Coverage 72.24% 72.24% -0.01%
==========================================
Files 684 684
Lines 101071 101142 +71
==========================================
+ Hits 73022 73071 +49
- Misses 26470 26492 +22
Partials 1579 1579
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 12 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
Assigning reviewers. If you would like to opt out of this review, comment R: @tvalentyn for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
python_callable.PythonCallableWithSource(fn)), | ||
'PyFilter': lambda keep: beam.Filter( | ||
python_callable.PythonCallableWithSource(keep)), | ||
'LogForTesting': lambda: beam.Map(log_and_return), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'LogForTesting': lambda: beam.Map(log_and_return), | |
'Log': lambda: beam.Map(log_and_return), |
any reason to call it ForTesting
? Seems universally useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging every element is certainly not to be encouraged for production pipelines...I might go so far as to consider it an anti-pattern. But I could see it being used to log bad elements or something similar. We can always add it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on all fronts, the log on a certain branch path does seem useful to me. I also kinda think that if you use something named "Log" you'll know what you're getting, and naming it "LogForTesting" doesn't really discourage use. I'm fine deferring this though
Run Python_Runners PreCommit |
Prefer to use generic MapToFields instead.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.