Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#26395 Disabling possibly unnecessary prefetching during GroupIntoBatches by using an experimental flag #26618

Closed

Conversation

nbali
Copy link
Contributor

@nbali nbali commented May 10, 2023

We haven't reached a verdict on the proper way to implement #26395, so I created a PR for all the alternatives. The chosen one should be reviewed+merged, the rest should be closed. (alternative PR: #26619)

closes #26395


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • [ ] Update CHANGES.md with noteworthy changes.
  • [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@nbali
Copy link
Contributor Author

nbali commented May 10, 2023

seemingly irrelevant failure during Java Tests / Java Wordcount Direct Runner (ubuntu-latest) (pull_request)

@nbali nbali force-pushed the fix_for_26395_by_using_an_experiment branch from 1f6eefe to 14c6555 Compare May 10, 2023 02:05
@nbali nbali force-pushed the fix_for_26395_by_using_an_experiment branch from 14c6555 to 1d62a65 Compare May 10, 2023 02:06
@nbali
Copy link
Contributor Author

nbali commented May 10, 2023

Run Java_Pulsar_IO_Direct PreCommit

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @robertwb for label java.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@nbali
Copy link
Contributor Author

nbali commented May 10, 2023

Run Java PreCommit

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks. Just one naming comment. I would vote to go through with this approach at this moment as it does not change the current setting; or can be one step further to make it opt-out, but not completely remove it at this moment

@@ -110,6 +112,10 @@
public class GroupIntoBatches<K, InputT>
extends PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, Iterable<InputT>>>> {

/** Experiment to "avoid possibly unnecessary prefetching". */
public static final String AVOID_POSSIBLY_UNNECESSARY_PREFETCHING =
"avoid_possibly_unnecessary_prefetching";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider a descriptive naming: disable_groupintobatches_preferch

@Abacn
Copy link
Contributor

Abacn commented May 12, 2023

Tested and found this might not be the cause of data processed being much larger than the actual data size: #26395 (comment)

@github-actions
Copy link
Contributor

Reminder, please take a look at this pr: @robertwb

@tvalentyn
Copy link
Contributor

waiting on author

@nbali
Copy link
Contributor Author

nbali commented May 23, 2023

FYI I'm trying to make time for this this week, but it's most likely it won't happen. If there is a way to make the reminders snooze for 1-2 weeks feel free to do so.

@github-actions
Copy link
Contributor

Reminder, please take a look at this pr: @robertwb

@tvalentyn
Copy link
Contributor

waiting on author

@github-actions
Copy link
Contributor

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jul 31, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2023

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Aug 8, 2023
@nbali nbali deleted the fix_for_26395_by_using_an_experiment branch November 18, 2023 05:26
@hpvd
Copy link

hpvd commented Apr 24, 2024

added this to [Parent issue] Support for Apache Pulsar #31078

@nbali
Copy link
Contributor Author

nbali commented May 22, 2024

@hpvd I'm pretty sure it's not related to Apache Pulsar

@hpvd
Copy link

hpvd commented May 22, 2024

@hpvd I'm pretty sure it's not related to Apache Pulsar

@nbali thanks, I have removed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Possibly unnecessary prefetch during GroupIntoBatches
4 participants