-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Knuth–Morris–Pratt algorithm for delimiter search in TextIO #32398
Conversation
R: @scwhittle, @Abacn Hi @scwhittle, I see you're working on the PR #32258 for fixing the issue #32241. I created this PR to fix the issue #32251 , but I believe it can also fix #32249. Hi @Abacn, I see you approved the PR #32298, but it cannot fix all the cases of #32241. Could you check it again? |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together, I got bogged down in trying to make startReading correct as well but it is probably best to fix separately.
startReading is incorrect in cases where the offset is smaller than the delimiter length, or smaller than where the headers end, where there is a BOM and headers etc which I uncovered when adding integration test of longer delimiters to sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java
Outdated
Show resolved
Hide resolved
03eb22d
to
5b827b7
Compare
5b827b7
to
8e5a6cd
Compare
@scwhittle Could you continue to review? I added a commit to change the delimiter search algorithm. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor cleanup comments. Thanks!
sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java
Outdated
Show resolved
Hide resolved
Run Java_IOs_Direct PreCommit |
Waiting on tests to complete to merge |
79ed964
to
4ad8e27
Compare
@scwhittle I fixed a bug at a commit which made the test failed. Could you continue to review this? Thanks. |
I believe this PR can fix #32251 (it's closed but still not fully fixed. see this case) and #32249.
Fix #32249, #32251.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.