Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for SparkRunner streaming stateful processing #33267

Merged
merged 21 commits into from
Dec 18, 2024

Conversation

twosom
Copy link
Contributor

@twosom twosom commented Dec 3, 2024

Please add a meaningful description for your change here

fixes #33237

This PR contains these changes

  • added support for SparkRunner streaming stateful processing

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@twosom twosom changed the title Added support for Spark streaming stateful ParDo Added support for Spark streaming stateful processing Dec 3, 2024
Copy link
Contributor

github-actions bot commented Dec 3, 2024

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@twosom
Copy link
Contributor Author

twosom commented Dec 3, 2024

Run Java PreCommit

Copy link
Contributor

github-actions bot commented Dec 3, 2024

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @lostluck added as fallback since no labels match configuration

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@twosom twosom changed the title Added support for Spark streaming stateful processing Added support for SparkRunner streaming stateful processing Dec 3, 2024
@liferoad
Copy link
Collaborator

liferoad commented Dec 4, 2024

R: @kennknowles

Copy link
Contributor

github-actions bot commented Dec 4, 2024

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

@kennknowles kennknowles self-requested a review December 5, 2024 14:55
@kennknowles
Copy link
Member

Ah can you add a small change to these files, which will cause the ValidatesRunner tests for stateful processing to run?

I don't know the codebase enough at this point to know what is the sharing between the things tested by these jobs.

(incidentally there is a _Java11 trigger file in that directory but the job is gone since 11 is now the default and it is the Java 8 test that is the outlier)

@github-actions github-actions bot added the build label Dec 6, 2024
@twosom
Copy link
Contributor Author

twosom commented Dec 6, 2024

@kennknowles

i touched trigger files!
Thanks!

@twosom
Copy link
Contributor Author

twosom commented Dec 6, 2024

Run Java_Spark3_Versions PreCommit

@twosom
Copy link
Contributor Author

twosom commented Dec 17, 2024

@kennknowles

Hi!
Just wanted to give you a gentle ping about my PR as I've been waiting for a review 😊
Would really appreciate your insights when you have a moment.

Thanks a lot!

@kennknowles
Copy link
Member

Acknowledged! I'm on it.

Copy link
Member

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution! I admit that I am leaning heaviliy on the testing since I'm a bit rusty on the runner internals and the Spark details, and I don't have time right now to page it all in. I don't want to slow down your contribution any further. I think we can merge, and I just left a small comment which you might adjust in a follow-up PR.


rejectTimers(doFn);
checkArgument(
!signature.processElement().isSplittable(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splittable and stateful are also just not compatible, so this situation should be impossible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! I'll remove this redundant check in a follow-up PR since splittable and stateful are mutually exclusive by design. Thank you for catching this. 😀

@kennknowles kennknowles merged commit f476417 into apache:master Dec 18, 2024
18 checks passed
Copy link

codecov bot commented Dec 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 57.37%. Comparing base (9232cd8) to head (eda50b2).
Report is 8 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master   #33267   +/-   ##
=========================================
  Coverage     57.37%   57.37%           
  Complexity     1475     1475           
=========================================
  Files           973      973           
  Lines        154905   154918   +13     
  Branches       1076     1076           
=========================================
+ Hits          88879    88887    +8     
- Misses        63810    63821   +11     
+ Partials       2216     2210    -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: Support Stateful Processing in Spark Runner (Without Timer Functionality)
3 participants