-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sink): do compact per barrier with the sink pk in the sink executor #13411
Comments
The motivations didn't fully convince me.
For these cases, I think
"Compact per barrier" was proposed to solve the "sink key & stream key mismatch" problem. Here it sounds like an abuse of it. |
Some efficient computation of the executors results that such as projectSet |
Is it possible to do the compaction on the fly during project and projectset? I imagine that only the adjacent items with the same key need to be checked and compacted. Will the overhead be significant? |
Yes, we can do that in some place to optimize #13409 But the question here is that we have in some cases, it is difficult to maintain the update delete and update insert be adjacent. e.g. 1: update join key/ over window partition key, the old value and new value could be shuffled into different partitions. |
Due to our previous discussion, in my mind, we finally decided to choose the Option 2 from these 2 options below:
Today I am just trying to follow the option 2. While for these exceptions you mentioned:
1 is inevitable. 2 is a trade-off for us. |
I understand that you want to solve an issue completely. The problem is, IMO, the proposed solution isn't free, and the price seems to be even higher than the benefit. |
This issue propose to always do compaction in the sink executor for those sink with the key/primary key. because
cons
After that, the freshness will be bound on the barrier interval. user need to config the barrier interval to achieve better freshness.
FORCE APPEND ONLY SINK
thanks @xiangjinwu #9443 (comment). The primary key defined on MQ is actually the partition key. MQ can maintain the order of events with the same key.
for the normal append only sink, RW will never retract the records and records can be reordered anyway based on the SQL semantics. But we have a "force append only" semantic widely used to convert a retractable stream to append only stream by only maintaining the insert operation. Under this definition. The order of events in the specified key matters.
So the common per barrier compact will also be done on the force append only sink.
The text was updated successfully, but these errors were encountered: