Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-14269] PulsarIOTest.testReadFromSimpleTopic flaky test #17473

Closed
wants to merge 14 commits into from

Conversation

MarcoRob
Copy link
Contributor

@MarcoRob MarcoRob commented Apr 26, 2022

PulsarIOTest uses a shared client across all tests, looks is the reason for the flaky test since is a race condition when multiple tests run in parallel.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@MarcoRob
Copy link
Contributor Author

Run Java PreCommit

1 similar comment
@MarcoRob
Copy link
Contributor Author

Run Java PreCommit

@asf-ci
Copy link

asf-ci commented Apr 26, 2022

Can one of the admins verify this patch?

1 similar comment
@asf-ci
Copy link

asf-ci commented Apr 26, 2022

Can one of the admins verify this patch?

@MarcoRob
Copy link
Contributor Author

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

Run Java PreCommit

7 similar comments
@MarcoRob
Copy link
Contributor Author

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

MarcoRob commented May 2, 2022

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

MarcoRob commented May 2, 2022

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

MarcoRob commented May 5, 2022

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

MarcoRob commented May 6, 2022

Run Java PreCommit

6 similar comments
@MarcoRob
Copy link
Contributor Author

MarcoRob commented May 6, 2022

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

MarcoRob commented May 6, 2022

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

MarcoRob commented May 6, 2022

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

MarcoRob commented May 9, 2022

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

MarcoRob commented May 9, 2022

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

MarcoRob commented May 9, 2022

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

MarcoRob commented Jul 6, 2022

Run Java PreCommit

2 similar comments
@MarcoRob
Copy link
Contributor Author

MarcoRob commented Jul 6, 2022

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

MarcoRob commented Jul 6, 2022

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

MarcoRob commented Jul 7, 2022

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

MarcoRob commented Jul 7, 2022

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

MarcoRob commented Jul 8, 2022

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

MarcoRob commented Jul 8, 2022

Status update: looks the issue regarding this flaky test is due the seek function from Pulsar Client, when it was seeking the timestamp, sometimes it sought a previous timestamp that was already processed then the error that the timestamp was not in the new range
Solution approach: set a validation when the start timestamp or initial restriction is greater than the current timestamp, then re-run the process until it grabs the correct timestamp.

@MarcoRob
Copy link
Contributor Author

MarcoRob commented Jul 8, 2022

Run Java PreCommit

5 similar comments
@MarcoRob
Copy link
Contributor Author

MarcoRob commented Jul 8, 2022

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

MarcoRob commented Jul 8, 2022

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

Run Java PreCommit

@MarcoRob
Copy link
Contributor Author

Hi @johnjcasey
I have been monitoring this flaky test which is throwing the following issue
Caused by: java.lang.IllegalArgumentException: Trying to claim offset 1648580086959 before start of the range [1648580087254, 1648580087266) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) at org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker.tryClaim(OffsetRangeTracker.java:97) at org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker.tryClaim(OffsetRangeTracker.java:38) at org.apache.beam.repackaged.direct_java.sdk.fn.splittabledofn.RestrictionTrackers$RestrictionTrackerObserver.tryClaim(RestrictionTrackers.java:59) at org.apache.beam.sdk.io.pulsar.ReadFromPulsarDoFn.processElement(ReadFromPulsarDoFn.java:170)

The issue looks like is due when the processElement search for the timestamp it fails because sometimes it set the cursor of the offset to a previous position that has already been processed which produces the above issue.

I set the following validation when this issue happens and if it does resume again the process until it is correct, let me know what do you think about it?

@MarcoRob
Copy link
Contributor Author

Hi @johnjcasey
Could you help me to validate if my fix to this issue is correct?
You can see above this comment the detail on the current issue
Thanks in advance

@johnjcasey
Copy link
Contributor

It seems like this is an issue with your seeking & committing logic to me. If seek is finding an element that has already been claimed, it would imply that that element hasn't been committed properly, or that your splitting logic isn't splitting correctly. Just retrying doesn't seem like a good pattern

@MarcoRob
Copy link
Contributor Author

It seems like this is an issue with your seeking & committing logic to me. If seek is finding an element that has already been claimed, it would imply that that element hasn't been committed properly, or that your splitting logic isn't splitting correctly. Just retrying doesn't seem like a good pattern

Since the current client we are using is a Reader Interface and this interface doesn't commit the message, only reads. Then it would require committing/acknowledging the message using the Consumer interface or the Admin client which as well is another thing to add within the TODO list for the current IO.

Maybe @nlu90 can give us a better approach to the current issue?

@github-actions
Copy link
Contributor

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 16, 2023
@github-actions
Copy link
Contributor

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@hpvd
Copy link

hpvd commented Apr 24, 2024

added this to [Parent issue] Support for Apache Pulsar #31078

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants