-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(snapshot-backfill): only receive mutation from barrier worker for snapshot backfill #18210
Conversation
…r snapshot backfill
Just make sure my understanding is correct, the behavior after this PR is:
Both LGTM.
But there is one thing that confuses me: why are we going to erase the mutation of barrier received from upstream. Are we going to erase it only in the dowstream streaming graph? If yes, what is the benefit of doing so given that the drop subscription mutation must be attached to upstream barrier to notify materialized executor to stop writing log store. |
Yes, this is correct.
The motivation is to get rid of the current hacky pre-sync mutation as described in the PR description. The input executor of the snapshot backfill actor still receive barriers from upstream. After #17613, the input executor receives the mutation of the received barriers , let's say with epoch 100, from local barrier worker. However, during snapshot backfill, the snapshot backfill actor only receive the mutation of fake barrier, let's say epoch 1, and cannot get the mutation of the upstream barrier with epoch 100. Therefore, we currently pre-sync the mutation of the epoch 100 to the snapshot backfill actor, so that its input executor can get the upstream mutation and will not be blocked. But this pre-sync mechanism is too hacky, and I am trying to get rid of this hacky mechanism. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
During snapshot backfill, we control the behavior via barrier mutation. Snapshot backfill cares about two mutation: the initial add mutation to get the log store subscriber info, and the drop subscription mutation to get notified on starting consuming upstream.
In snapshot backfill executor, we receive barriers from both upstream and local barrier worker. In current main branch, we use the mutations of the barriers received from upstream. Under this design, we will have to introduce the hacky pre-sync mutation mechanism, without which when we are still under small fake epoch when consuming upstream snapshot, we won't be able to get the mutation of upstream barrier, because the upstream barrier has later epoch than the latest injected epoch of the backfilling partial graph.
To get rid of this pre-sync mutation mechanism, we should first change to use the mutation of barriers received from local barrier worker, which is what we did in this PR. The initial add mutation is changed to be sent from the first fake barrier rather than the first upstream barrier.
The implementation of
UpstreamBuffer
is also changed.UpstreamBuffer
should be aware of the drop subscription mutation to stop consuming upstream. Since previously the mutation is carried in the upstream barrier, theUpstreamBuffer
only need to receive barrier from upstream. However, since we are going to erase the mutation of barrier received from upstream, theUpstreamBuffer
now also need to take the barrier receiver from local barrier worker.In the phase of consuming snapshot, the
UpstreamBuffer
will receive upstream data with no fear, because it's not likely to get notified about the finish of backfill in this phase, and it won't be responsible for receiving barrier from local barrier worker yet. When we enter the phase that consumes log store, we should be aware of the barrier mutation, and therefore theUpstreamBuffer
starts receiving barriers from local barrier worker and always check the mutation to decide whether to stop consuming upstream.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.