Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(storage): add SST sanity check during commit epoch #18757

Merged
merged 6 commits into from
Sep 30, 2024

Conversation

zwang28
Copy link
Contributor

@zwang28 zwang28 commented Sep 29, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

SstObjectIdTracker has been used to prevent full GC from deleteing SSTs that has been written to object store but not committed to meta node.
In the context of partial checkpoint, the SstObjectIdTracker no longer behave correctly in compute node.
So #18641 has removed its usage in compute node.

This PR adds additional SST sanity check during commit epoch, to ensure potentially GCed SSTs won't be committed to Hummock Version.

  • It uses meta node's now - configurable SST retention time as the low watermark
  • Any SST with timestamp below this watermark is rejected from being committed.

There's a trade-off between the accuracy and IOPS, pls refer to the new comment in builder.rs.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@@ -543,7 +543,7 @@ message VacuumTask {

// Scan object store to get candidate orphan SSTs.
message FullScanTask {
uint64 sst_retention_time_sec = 1;
uint64 sst_retention_watermark = 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The watermark is meta node's now - SST retention sec.
Previously it's calculated in compute node, which relies on compute node's local clock.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: the reason why this change is compatible is because the previous sst_retention_time_sec must be way smaller than sst_retention_watermark. That means:

  • If meta is upgraded before compactor, compactor will be less aggressive to delete objects because now - sst_retention_watermark (using old logic to interpret the new field) is way smaller.
  • If compactor is upgraded before meta, compactor will also be less aggressive to delete objects because sst_retention_time_sec (using new logic to interpret the old field) is way smaller

@@ -32,13 +32,14 @@ message BarrierCompleteResponse {
string request_id = 1;
common.Status status = 2;
repeated CreateMviewProgress create_mview_progress = 3;
message GroupedSstableInfo {
message LocalSstableInfo {
Copy link
Contributor Author

@zwang28 zwang28 Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since SSTs to commit are no longer grouped, I make the pb type name consistent with the corresponding type in mem.

@zwang28 zwang28 requested review from wenym1, hzxa21 and Li0k September 29, 2024 08:17
Copy link
Contributor

@Li0k Li0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

src/storage/src/hummock/sstable/builder.rs Show resolved Hide resolved
@zwang28 zwang28 enabled auto-merge September 30, 2024 06:42
@zwang28 zwang28 disabled auto-merge September 30, 2024 06:50
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -543,7 +543,7 @@ message VacuumTask {

// Scan object store to get candidate orphan SSTs.
message FullScanTask {
uint64 sst_retention_time_sec = 1;
uint64 sst_retention_watermark = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: the reason why this change is compatible is because the previous sst_retention_time_sec must be way smaller than sst_retention_watermark. That means:

  • If meta is upgraded before compactor, compactor will be less aggressive to delete objects because now - sst_retention_watermark (using old logic to interpret the new field) is way smaller.
  • If compactor is upgraded before meta, compactor will also be less aggressive to delete objects because sst_retention_time_sec (using new logic to interpret the old field) is way smaller

@zwang28 zwang28 added this pull request to the merge queue Sep 30, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 30, 2024
@zwang28 zwang28 enabled auto-merge September 30, 2024 07:55
@zwang28 zwang28 added this pull request to the merge queue Sep 30, 2024
Merged via the queue into main with commit 7ff4d98 Sep 30, 2024
30 of 33 checks passed
@zwang28 zwang28 deleted the wangzheng/check_sst branch September 30, 2024 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants