Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(state-table): make consistent op an enum in state table #16471

Merged
merged 3 commits into from
Apr 24, 2024

Conversation

wenym1
Copy link
Contributor

@wenym1 wenym1 commented Apr 24, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Previously, we have a bool flag is_consistent_op in state table. After #15301, when is_consistent_op is true, we will further have an is_log_store to represent whether we need to write and store the old value. In this PR, we change the is_consistent_op to an enum StateTableOpConsistencyLevel with the three enum cases.

Besides, before this PR, the materialized executor need to carefully decide whether the consistency level is changed when receiving a barrier and whether to call commit or commit_with_switch_consistent_op. In this PR, it will change to always call a commit method and specify the expected op consistency level. The state table will change whether the op consistency level is changed, and only if changed, it then pass the switch info to state store.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Comment on lines 256 to 260
pub enum StateTableOpConsistencyLevel {
Inconsistent,
ConsistentOldValue,
LogStoreEnabled,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that these three levels are in a progressive relationship. Could you elaborate more on how LogStoreEnabled is different from ConsistentOldValue and why we need this?

Copy link
Contributor Author

@wenym1 wenym1 Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogStoreEnabled has the same op consistency requirement as ConsistentOldValue. The difference is that, for ConsistentOldValue, in release mode that has no old value sanity check, the behavior of ConsistentOldValue is the same as Inconsistent, which simply writes and stores only the new value to state store, but for LogStoreEnabled, we will also write and upload the old value to state store.

The old value is written and uploaded to state store so that it can be combined with the new value to replay the change log. The change log will be used in partial checkpoint and subscription.

LogStoreEnabled is not used in this PR yet. In will be used when we enable L0 log store for subscription.

@wenym1 wenym1 requested a review from chenzl25 April 24, 2024 09:11
Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

src/stream/src/executor/mview/materialize.rs Outdated Show resolved Hide resolved
@wenym1 wenym1 enabled auto-merge April 24, 2024 09:41
@wenym1 wenym1 added this pull request to the merge queue Apr 24, 2024
Merged via the queue into main with commit 97e1bd0 Apr 24, 2024
27 of 28 checks passed
@wenym1 wenym1 deleted the yiming/state-table-op-consistency-enum branch April 24, 2024 10:36
@@ -58,6 +58,21 @@ pub struct MaterializeExecutor<S: StateStore, SD: ValueRowSerde> {
conflict_behavior: ConflictBehavior,

version_column_index: Option<u32>,

may_have_downstream: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is may_have_downstream, instead of has_downstream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag only gets changed when there is newly created downstream, and won't be changed when some downstreams are dropped, so the value can be false positive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. That's the same with my understanding. But are there any problems if we also handle the drop and re-enable the switch?

@BugenZhao mentioned this optimization
#16348 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we cannot do this because there is no enough information in the barrier with drop streaming job command. @chenzl25 Can you provide with more details?

But I think it can be implemented if we incrementally maintain a full copy of downstream information in the mv executor.

Copy link
Contributor

@chenzl25 chenzl25 Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take AddMutation as an example. It contains pub adds: HashMap<ActorId, Vec<PbDispatcher>> to represent how many dispatchers are to be added for an actor, but for UpdateMutation, it only has dropped_actors: HashSet<ActorId> so we don't have enough information to maintain the dispatcher number for MaterializedView.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants