-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[yaml] Add enrichment transform to Beam YAML #32286
Conversation
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
d9bff19
to
7ebba51
Compare
Assigning reviewers. If you would like to opt out of this review, comment R: @shunping for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Thanks for contributing to Beam, @reeba212! I see there is some lint problem you will need to fix in your code: https://github.com/apache/beam/actions/runs/10944933393/job/30387980829?pr=32286 Then, I also see the errors on when import
@Polber, what's the plan for supporting the enrichment transform? To make tests green, we have a few options. For example:
What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple comments to hopefully get tests green!
Reminder, please take a look at this pr: @shunping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tests are still failing. Could you please fix them?
1. Added links for different handlers and removed code for unreachable conditions 2. Removed patch decorator in test
b176898
to
c647e47
Compare
@@ -69,7 +69,7 @@ def temp_bigquery_table(project, prefix='yaml_bq_it_'): | |||
dataset_id = '%s_%s' % (prefix, uuid.uuid4().hex) | |||
bigquery_client.get_or_create_dataset(project, dataset_id) | |||
logging.info("Created dataset %s in project %s", dataset_id, project) | |||
yield f'{project}:{dataset_id}.tmp_table' | |||
yield f'{project}.{dataset_id}.tmp_table' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this is reused by EnrichmentTransform's BQ handler which uses the non-legacy form and BigQueryIO accepts both, so this should be safe
from apache_beam.yaml import options | ||
|
||
try: | ||
from apache_beam.transforms.enrichment import Enrichment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps not to fix in this PR (but if not let's file a bug). It seems that Enrichment itself should not require GCP, only specific handlers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I think the only class imported is a custom Exception class
raise ValueError( | ||
f'Unable to instantiate provider of type {type} ' | ||
f'at line {SafeLineLoader.get_line(spec)}: {exn}') from exn | ||
if isinstance(exn, ModuleNotFoundError) and config.get('requires_gcp', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed in addition to the import guards in yaml_enrichment.py?
It would also be better to get an error on an attempted use rather than a print statement whenever this is imported without [gcp] installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you're right this isn't needed anymore. This was my initial solution for another test failure before I realized the imports needed to be disabled entirely to fix another test suite. I think this is redundant and misleading as you said at this point.
@reeba212 I added some comments on what to remove!
@robertwb Do you mind taking another look? |
Reminder, please take a look at this pr: @shunping |
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.