Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: s3 source cannot read incremental files #18017

Merged
merged 23 commits into from
Aug 30, 2024
Merged

Conversation

tabVersion
Copy link
Contributor

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

prev fix does not handle the case right #17702

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@github-actions github-actions bot added the type/fix Bug fix label Aug 13, 2024
Signed-off-by: tabVersion <[email protected]>
@tabVersion tabVersion force-pushed the tab/fix-s3-read-file branch from fe5754d to 2c48e34 Compare August 13, 2024 05:31
Signed-off-by: tabVersion <[email protected]>
@tabVersion tabVersion added the need-cherry-pick-release-1.10 Open a cherry-pick PR to branch release-1.10 after the current PR is merged label Aug 13, 2024
Signed-off-by: tabVersion <[email protected]>
@hzxa21
Copy link
Collaborator

hzxa21 commented Aug 14, 2024

Maybe related: #17991

Comment on lines 74 to 75
time.sleep(10)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this test more robust? e.g., sth like: 1. upload some files. 2. check. 3. upload another batch of files. 4. check again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does the same thing

first check

check after upload

success_flag = check_for_new_files(FILE_NUM, ITEM_NUM_PER_FILE, fmt)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why it didn't catch the bug before? 👀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised, too. 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So can we make this test more robust like doing interleaving upload and check? If a single sleep can affect the correctness, it seems too fragile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misunderstood it. do_test is the first check, and check_for_new_files is the second check. I just looked for the usage of check_for_new_files and thought it's the only check.

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Aug 20, 2024

New update:

  1. Fixed the issue of being unable to read stock files
  2. Temporarily set the sleep interval to 1 minute, and try to make it a configuration item later. (open an issue to track feat: configurable incremental file refresh interval for file source #18123)

Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why the CI didn't catch the bug is because CI passed even when test failed.

https://buildkite.com/risingwavelabs/main-cron/builds/3181#019175ab-4d30-44bd-aba4-562ab5ebd33a

image

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Aug 22, 2024

Since I also modified this PR, this PR needs approval from another reviewer( @xxchan @fuyufjh ). Basically
, this PR has fixed the problem of reading newly added files. Currently, json and csv encode have been fixed. There is another problem with parquet encode, which will be fixed in the another PR.

Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix test and CI

@tabVersion tabVersion requested a review from xxchan August 22, 2024 09:36
@graphite-app graphite-app bot requested a review from a team August 22, 2024 16:46
@fuyufjh fuyufjh self-requested a review August 23, 2024 03:02
@wcy-fdu
Copy link
Contributor

wcy-fdu commented Aug 30, 2024

This test(S3 source and sink on parquet file) is a flacky test. It will also fail in other PRs, regardless of the changes made in this PR.

@wcy-fdu wcy-fdu enabled auto-merge August 30, 2024 03:12
@wcy-fdu wcy-fdu added this pull request to the merge queue Aug 30, 2024
Merged via the queue into main with commit 70c1146 Aug 30, 2024
30 of 31 checks passed
@wcy-fdu wcy-fdu deleted the tab/fix-s3-read-file branch August 30, 2024 03:56
@wcy-fdu wcy-fdu restored the tab/fix-s3-read-file branch August 30, 2024 04:05
tabVersion added a commit that referenced this pull request Sep 2, 2024
Signed-off-by: tabVersion <[email protected]>
Co-authored-by: congyi <[email protected]>
Co-authored-by: congyi wang <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Sep 3, 2024
Signed-off-by: tabVersion <[email protected]>
Co-authored-by: congyi <[email protected]>
Co-authored-by: congyi wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/main-cron/run-selected need-cherry-pick-release-1.10 Open a cherry-pick PR to branch release-1.10 after the current PR is merged need-cherry-pick-release-2.0 type/fix Bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants