Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[yaml] Add Beam YAML Examples and Getting started docs #30003

Merged
merged 12 commits into from
Mar 8, 2024

Conversation

Polber
Copy link
Contributor

@Polber Polber commented Jan 12, 2024

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@Polber
Copy link
Contributor Author

Polber commented Jan 12, 2024

R: @robertwb

Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@Polber Polber force-pushed the jkinard/yaml-examples branch 2 times, most recently from 5a3b140 to 712c9ae Compare January 23, 2024 19:07
@Polber Polber requested a review from robertwb January 31, 2024 19:18
@Polber Polber force-pushed the jkinard/yaml-examples branch from 712c9ae to 9af22dc Compare January 31, 2024 19:59
Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good.

Let's remove the docs changes as we're moving those to the site and get these examples in.

@robertwb
Copy link
Contributor

Any update on this?

@Polber Polber force-pushed the jkinard/yaml-examples branch from 9e9bf7b to 3dcd076 Compare February 22, 2024 21:14
@Polber Polber requested a review from robertwb February 22, 2024 21:14
@Polber
Copy link
Contributor Author

Polber commented Feb 22, 2024

@robertwb All comments addressed and rebased on master

Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor comments, then LGTM.

### Element-wise
These examples leverage the built-in mapping transforms including `MapToFields`,
`Filter` and `Explode`. More information can be found about mapping transforms
[here](../docs/yaml_mapping.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that they're live, let's point to the official docs on https://beam.apache.org/documentation/sdks/yaml/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pointed to UDF section since that is where MapToFields lives

sdks/python/apache_beam/yaml/examples/README.md Outdated Show resolved Hide resolved
# see https://cloud.google.com/docs/authentication/external/set-up-adc for more
# information
#
# This pipeline reads in a text file, maps all words to a value of "1", sums
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps intersperse these comments with the code itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robertwb I refactored the example a bit to make it follow the logic more semantically. It also outputs Row(word=..., count=...) instead of Row(output="word: count")

Let me know what you think

@Polber Polber force-pushed the jkinard/yaml-examples branch 2 times, most recently from 0f08592 to c36b68a Compare February 27, 2024 02:02
Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good once we fix the failing tests and missing license headers.

@Polber Polber force-pushed the jkinard/yaml-examples branch 2 times, most recently from 99d4115 to 94488af Compare February 28, 2024 21:17
@robertwb
Copy link
Contributor

Looks like you need to add an apache license preamble to sdks/python/apache_beam/yaml/examples/README.md to make RAT happy.

@Polber
Copy link
Contributor Author

Polber commented Feb 29, 2024

Looks like you need to add an apache license preamble to sdks/python/apache_beam/yaml/examples/README.md to make RAT happy.

Are the other tests known issues? I don't see how my PR is affecting huggingface and rowcoder tests

@kennknowles
Copy link
Member

Perhaps rebasing will get to green if there were issues on master?

@robertwb
Copy link
Contributor

robertwb commented Mar 4, 2024

I kicked off some re-runs, but I'm not sure if it always rebases on head. +1 to merging with master and re-pushing if this doesn't get things green.

@kennknowles
Copy link
Member

I think it does not. The various merge and HEAD commits associated with a PR are, I think, only updated when the head commit of the PR changes. So reruns will be against the same commit. (I could have this wrong, of course)

@Polber Polber force-pushed the jkinard/yaml-examples branch 3 times, most recently from 0dd6cf8 to c9fc5e7 Compare March 7, 2024 21:16
Polber added 7 commits March 7, 2024 16:53
Signed-off-by: Jeffrey Kinard <[email protected]>
Signed-off-by: Jeffrey Kinard <[email protected]>
Signed-off-by: Jeffrey Kinard <[email protected]>
Signed-off-by: Jeffrey Kinard <[email protected]>
Signed-off-by: Jeffrey Kinard <[email protected]>
Signed-off-by: Jeffrey Kinard <[email protected]>
Signed-off-by: Jeffrey Kinard <[email protected]>
@Polber Polber force-pushed the jkinard/yaml-examples branch from c9fc5e7 to 7415134 Compare March 7, 2024 21:53
Signed-off-by: Jeffrey Kinard <[email protected]>
@Polber Polber force-pushed the jkinard/yaml-examples branch from 7415134 to 7012d72 Compare March 7, 2024 22:21
Polber added 2 commits March 7, 2024 17:53
Signed-off-by: Jeffrey Kinard <[email protected]>
Signed-off-by: Jeffrey Kinard <[email protected]>
@Polber Polber force-pushed the jkinard/yaml-examples branch 2 times, most recently from f134f97 to d5d9447 Compare March 8, 2024 02:46
Signed-off-by: Jeffrey Kinard <[email protected]>
@Polber Polber force-pushed the jkinard/yaml-examples branch from d5d9447 to 8783561 Compare March 8, 2024 03:14
@Polber
Copy link
Contributor Author

Polber commented Mar 8, 2024

@robertwb

Finally got the tests green.

The RowCoderTest imports beam, which imports the examples_test.py file I added, which used to import main.py. In main.py, there is the line LogicalType.register_logical_type(MillisInstant) - This overrided the MicrosInstant that RowCoderTest requires to pass.

I adjusted the test to run yaml_transform.expand_pipeline instead, which fixed that issue. I also realized that the examples tests were not actually running on presubmits, so I added several fixes to get them running and green.

@robertwb robertwb merged commit 53e8efa into apache:master Mar 8, 2024
74 checks passed
@robertwb
Copy link
Contributor

robertwb commented Mar 8, 2024

Thanks. Nice to finally get this in.

hjtran pushed a commit to hjtran/beam that referenced this pull request Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants