Bug(compaction): Unable to trigger split in time, when barrier latency is high #15291

Li0k · 2024-02-27T08:13:37Z

Describe the bug

In Hummock, the decision to split a compaction group is made by counting the flush throughput of the table.

risingwave/src/meta/src/hummock/manager/mod.rs

Line 2597 in 41f4ad5

async fn on_handle_check_split_multi_group(&self) {

To minimize the effects of jitter, we introduce the concept of window_size to make the statistics more accurate and add new statistics to the window at each commit_epoch. https://github.com/risingwavelabs/risingwave/blob/41f4ad55c636836fc9c7f7860ada535e26dbd6ca/src/meta/src/hummock/manager/mod.rs# L1779

Recently, we found that when a Barrier contains a large amount of data, we can't update the statistical information in time (affected by the barrier latency), and thus can't trigger the split in time.

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

Li0k · 2024-02-27T08:23:33Z

I'm assuming that the write amplification within cg2 / cg3 is still due to the data misalignment factor.
It doesn't seem reasonable to perform a split directly during the new table creation or recovery phase. (We don't support merge at the moment).

I prefer to do some data analysis in the flush phase and perform a split on the SST to promote boundary alignment.

@Little-Wallace @zwang28 @hzxa21

hzxa21 · 2024-02-27T08:59:42Z

I prefer to do some data analysis in the flush phase and perform a split on the SST to promote boundary alignment.

By split you mean putting data related to specific table ids in separate SSTs, not splitting compaction group, right?

If that is the case, is this a permanent change (applied to all future data related to these tables) or a temporary change (only applied to data related to these tables in some period)?

github-actions · 2024-06-12T08:57:14Z

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

hzxa21 · 2024-10-08T08:17:00Z

I think the new split strategy (WIP) can resolve this issue, right? cc @Li0k

Li0k added the type/bug Something isn't working label Feb 27, 2024

github-actions bot added this to the release-1.7 milestone Feb 27, 2024

Li0k modified the milestones: release-1.7, release-1.8 Mar 6, 2024

Li0k modified the milestones: release-1.8, release-1.9 Apr 8, 2024

Li0k assigned Li0k and Little-Wallace Apr 8, 2024

Li0k mentioned this issue Apr 8, 2024

perf(storage): Improve data alignment for multi-table compaction groups #13037

Open

github-actions bot added the no-issue-activity label Jun 12, 2024

hzxa21 unassigned Little-Wallace Oct 8, 2024

hzxa21 modified the milestones: release-1.9, future-release-2.2 Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug(compaction): Unable to trigger split in time, when barrier latency is high #15291

Bug(compaction): Unable to trigger split in time, when barrier latency is high #15291

Li0k commented Feb 27, 2024

Li0k commented Feb 27, 2024

hzxa21 commented Feb 27, 2024

github-actions bot commented Jun 12, 2024

hzxa21 commented Oct 8, 2024

Bug(compaction): Unable to trigger split in time, when barrier latency is high #15291

Bug(compaction): Unable to trigger split in time, when barrier latency is high #15291

Comments

Li0k commented Feb 27, 2024

Describe the bug

Error message/log

To Reproduce

Expected behavior

How did you deploy RisingWave?

The version of RisingWave

Additional context

Li0k commented Feb 27, 2024

hzxa21 commented Feb 27, 2024

github-actions bot commented Jun 12, 2024

hzxa21 commented Oct 8, 2024