You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, we found that when a Barrier contains a large amount of data, we can't update the statistical information in time (affected by the barrier latency), and thus can't trigger the split in time.
Error message/log
No response
To Reproduce
No response
Expected behavior
No response
How did you deploy RisingWave?
No response
The version of RisingWave
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
I'm assuming that the write amplification within cg2 / cg3 is still due to the data misalignment factor.
It doesn't seem reasonable to perform a split directly during the new table creation or recovery phase. (We don't support merge at the moment).
I prefer to do some data analysis in the flush phase and perform a split on the SST to promote boundary alignment.
I prefer to do some data analysis in the flush phase and perform a split on the SST to promote boundary alignment.
By split you mean putting data related to specific table ids in separate SSTs, not splitting compaction group, right?
If that is the case, is this a permanent change (applied to all future data related to these tables) or a temporary change (only applied to data related to these tables in some period)?
Describe the bug
In Hummock, the decision to split a compaction group is made by counting the flush throughput of the table.
risingwave/src/meta/src/hummock/manager/mod.rs
Line 2597 in 41f4ad5
To minimize the effects of jitter, we introduce the concept of
window_siz
e to make the statistics more accurate and add new statistics to the window at eachcommit_epoch
. https://github.com/risingwavelabs/risingwave/blob/41f4ad55c636836fc9c7f7860ada535e26dbd6ca/src/meta/src/hummock/manager/mod.rs# L1779Recently, we found that when a
Barrier
contains a large amount of data, we can't update the statistical information in time (affected by the barrier latency), and thus can't trigger the split in time.Error message/log
No response
To Reproduce
No response
Expected behavior
No response
How did you deploy RisingWave?
No response
The version of RisingWave
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: