Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): replace hummock protobuf strcut with rust struct #15386

Merged
merged 28 commits into from
Jul 25, 2024

Conversation

Li0k
Copy link
Contributor

@Li0k Li0k commented Mar 1, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

as title

  1. reduce memory copy of huge struct
  2. Easier to implement some methods for structs, less ext

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@Li0k Li0k requested a review from Little-Wallace March 4, 2024 08:00
@Li0k Li0k marked this pull request as ready for review March 4, 2024 08:01
@Li0k Li0k requested a review from a team as a code owner March 4, 2024 08:01
@Li0k Li0k changed the title feat(storage): replace pb with struct feat(storage): replace hummock protobuf strcut with rust struct Mar 4, 2024
@Li0k Li0k requested a review from wenym1 March 4, 2024 10:44
pub struct SstableInfo {
pub object_id: u64,
pub sst_id: u64,
pub key_range: Option<KeyRange>,
Copy link
Contributor

@wenym1 wenym1 Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use FullKeyRange instead rather than the KeyRange. The field is_right_inclusive is very implicit. Maybe we can do this in a future PR.

fn estimated_encode_len(&self) -> usize;
}

pub trait ProtoSerializeExt {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between these two traits? May add some comments to describe the difference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all appearance of T is Self, we don't need the T, and we can use Self directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

src/storage/src/hummock/compactor/compactor_runner.rs Outdated Show resolved Hide resolved
src/storage/hummock_sdk/src/version.rs Outdated Show resolved Hide resolved
src/storage/hummock_sdk/src/version.rs Outdated Show resolved Hide resolved
src/meta/src/hummock/compaction/mod.rs Outdated Show resolved Hide resolved
@Li0k Li0k requested review from MrCroxx and hzxa21 March 8, 2024 07:59
Comment on lines 372 to 379
// Similar to `ProtoSerializeExt``, we allow `ProtoSerializeOwnExt to provide a method to obtain ownership. Therefore, copying behavior in memory can be reduced
pub trait ProtoSerializeOwnExt {
type PB: Clone + PartialEq + ::prost::Message;

fn from_protobuf_own(pb: Self::PB) -> Self;

fn to_protobuf_own(self) -> Self::PB;
}
Copy link
Contributor

@MrCroxx MrCroxx Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Similar to `ProtoSerializeExt``, we allow `ProtoSerializeOwnExt to provide a method to obtain ownership. Therefore, copying behavior in memory can be reduced
pub trait ProtoSerializeOwnExt {
type PB: Clone + PartialEq + ::prost::Message;
fn from_protobuf_own(pb: Self::PB) -> Self;
fn to_protobuf_own(self) -> Self::PB;
}
// Similar to `ProtoSerializeExt``, we allow `ProtoSerializeOwnExt to provide a method to obtain ownership. Therefore, copying behavior in memory can be reduced
pub trait OwnedProtoSerialize {
type PB: Clone + PartialEq + ::prost::Message;
fn from_owned_protobuf(pb: Self::PB) -> Self;
fn to_owned_protobuf(self) -> Self::PB;
}

@@ -355,3 +355,25 @@ impl EpochWithGap {
self.0 & EPOCH_SPILL_TIME_MASK
}
}

pub trait ProtoSerializeSizeEstimatedExt {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the traits all named with the suffix 'Ext'? Should they give default implementation and implements for all types that applies the trait without Ext?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think defining a new trait is unnecessary, unless we have some common logic that takes the trait abstraction as input, but it seems that there is no such common logic. It's more like enforcing a method name convention.

@MrCroxx
Copy link
Contributor

MrCroxx commented Mar 14, 2024

As discussed offline with @Li0k , it is not necessary to extract a trait for there is nowhere to use it. The structs should impl the From trait directly. e.g.

impl From<PbVnodeWatermark> for VnodeWatermark { ... }
impl From<&PbVnodeWatermark> for VnodeWatermark { ... }
impl From<VnodeWatermark> for PbVnodeWatermark { ... }
impl From<&VnodeWatermark> for PbVnodeWatermark { ... }

If there is a need to use a trait, the trait can be defined with From bound. e.g.

trait ProtoSerialize: From<&PbVnodeWatermark> + From<&VnodeWatermark> {}

And automatically impl for all types that satisfy the bound. e.g.

impl<T: From<&PbVnodeWatermark> + From<&VnodeWatermark>> ProtoSerialize for T {}

@BugenZhao
Copy link
Member

The structs should impl the From trait directly.

The downside of From is that it's a foreign trait to us, which can not be further implemented on foreign types.

Say that we'll always convert Vec<usize> to repeated uint64 (which is Vec<u64> in prost) in protobuf. Currently, the developers have to manually write xx.into_iter().map(|x| x as _).collect() everywhere, which is really verbose.

It would be nice if we can directly write xx.into(). However, this is not possible if we're leveraging the From trait as there's no way to implement something like impl From<Vec<usize>> for Vec<u64>. Instead, a custom trait will enable this and further make it possible to directly derive the implementation by recursively visiting the fields on complex messages.

@MrCroxx
Copy link
Contributor

MrCroxx commented Mar 14, 2024

The structs should impl the From trait directly.

The downside of From is that it's a foreign trait to us, which can not be further implemented on foreign types.

Say that we'll always convert Vec<usize> to repeated uint64 (which is Vec<u64> in prost) in protobuf. Currently, the developers have to manually write xx.into_iter().map(|x| x as _).collect() everywhere, which is really verbose.

It would be nice if we can directly write xx.into(). However, this is not possible if we're leveraging the From trait as there's no way to implement something like impl From<Vec<usize>> for Vec<u64>. Instead, a custom trait will enable this and further make it possible to directly derive the implementation by recursively visiting the fields on complex messages.

Do we need to impl From for low-level type like Vec<usize>? I thought impl on defined message types is enough before. 🤔

@@ -326,7 +325,7 @@ impl NonOverlapSubLevelPicker {
level_ssts.sort_by(|sst1, sst2| {
let a = sst1.key_range.as_ref().unwrap();
let b = sst2.key_range.as_ref().unwrap();
a.compare(b)
a.cmp(b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May use sort_by_key here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the design of interface, we should clone the key_range if we use sort_by_key. Even though copying the key_range after the PR is lightweight, I wanted to keep the clone from being triggered as much as possible.

src/storage/hummock_sdk/src/key_range.rs Outdated Show resolved Hide resolved
src/storage/hummock_sdk/src/change_log.rs Outdated Show resolved Hide resolved
@@ -355,3 +355,25 @@ impl EpochWithGap {
self.0 & EPOCH_SPILL_TIME_MASK
}
}

pub trait ProtoSerializeSizeEstimatedExt {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think defining a new trait is unnecessary, unless we have some common logic that takes the trait abstraction as input, but it seems that there is no such common logic. It's more like enforcing a method name convention.

src/storage/hummock_test/src/hummock_read_version_tests.rs Outdated Show resolved Hide resolved
src/meta/src/hummock/compaction/mod.rs Outdated Show resolved Hide resolved
pub uncompressed_file_size: u64,
}

impl From<&PbOverlappingLevel> for OverlappingLevel {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's unnecessary to implement a trait for the conversion, because we don't have any code that depends on the From or Into trait. The only benefit is that when we implement From, we can call .into(), but the benefit is limited. The code will be the same if we don't implement the trait and just implement a method for the new struct like

impl OverlappingLevel { // No impl `From<&PbOverlappingLevel>`
    fn from(pb_overlapping_level: &PbOverlappingLevel) -> Self {
        Self {
            ...
        }
    }
}

And then we'd better rename the method name from from, which is too general, to from_protobuf and to_protobuf so that we can explicitly know that we are doing a conversion between new struct and pb when we read the code.

ditto for all other impl From in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not biased. @hzxa21

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases, I want to consume the incoming arguments directly (instead of copying them), and if we want to fulfil this requirement, we still need to implement four functions

  1. from_protobuf(struct)
  2. from_protobuf(struct_ref)
  3. to_protobuf1(struct)
  4. to_protobuf2(struct_ref)

I don't think there's much difference except in naming, as in commit, and I've provided from_protobuf and to_protobuf wrappers for some common structures

@graphite-app graphite-app bot requested a review from a team June 19, 2024 04:34
@risingwavelabs risingwavelabs deleted a comment from graphite-app bot Jun 19, 2024
@BugenZhao BugenZhao removed the request for review from a team June 19, 2024 04:37
Copy link

gitguardian bot commented Jul 11, 2024

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
9425213 Triggered Generic Password ebcc00b ci/scripts/e2e-sink-test.sh View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

src/storage/hummock_sdk/src/version.rs Outdated Show resolved Hide resolved
src/meta/src/hummock/compaction/mod.rs Outdated Show resolved Hide resolved
src/storage/hummock_sdk/src/table_watermark.rs Outdated Show resolved Hide resolved
src/storage/hummock_sdk/src/version.rs Outdated Show resolved Hide resolved
src/storage/hummock_sdk/src/version.rs Outdated Show resolved Hide resolved
}

impl CompactTask {
pub fn task_type(&self) -> PbTaskType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto for these methods.

Besides, I think we should not provide method to get the PbXxx type from the CompactTask. We'd better just return the the Xxx defined by ourselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it in other PR, remove the get function first

src/storage/hummock_sdk/src/version.rs Outdated Show resolved Hide resolved
src/meta/service/src/hummock_service.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM! Thanks for the great work!

src/meta/model_v2/src/hummock_sstable_info.rs Outdated Show resolved Hide resolved
src/meta/src/hummock/manager/compaction.rs Outdated Show resolved Hide resolved
.sum::<usize>()
+ size_of::<u32>()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split this file and move the new structs to separate files to avoid making this file too large?

@Li0k Li0k enabled auto-merge July 25, 2024 07:01
@Li0k Li0k added this pull request to the merge queue Jul 25, 2024
Merged via the queue into main with commit 1c2c400 Jul 25, 2024
30 of 31 checks passed
@Li0k Li0k deleted the li0k/storage_pb_to_struct branch July 25, 2024 08:28
@zwang28 zwang28 mentioned this pull request Oct 18, 2024
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants