blockbuilder: more tests #9314

narqo · 2024-09-17T14:46:26Z

What this PR does

~~This one seats atop #9199 for now~~

This is part of #8635; refer to it for more details.

Here we backport the test cases for how block-builder handles the out-of-order samples.

Also, the PR fixes a flaky TestBlockBuilder_StartWithLookbackOnNoCommit test, by making sure the test waits for the correct outcome.

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
about-versioning.md updated with experimental features.

Signed-off-by: Vladimir Varankin <[email protected]>

narqo · 2024-09-27T09:44:25Z

pkg/blockbuilder/blockbuilder_test.go

+			cortex_blockbuilder_consumer_lag_records{partition="0"} 0
+			cortex_blockbuilder_consumer_lag_records{partition="1"} 0
+		`), "cortex_blockbuilder_consumer_lag_records"))
+	}, 30*time.Second, 100*time.Millisecond)


Note: here (and other similar changes), it's fine to wait that long. These assertions are terminal, so the test should not process if the outcome doesn't "eventually" happen.

dimitarvdimitrov · 2024-09-27T10:45:41Z

pkg/blockbuilder/blockbuilder_test.go

+	t.Run("future record", func(t *testing.T) {
+		// The sample from above which was in-order but the kafka record was in future
+		// should get consumed in this cycle. The other sample that is still in the future should not be consumed.
+		cycleEnd = cycleEnd.Add(cfg.ConsumeInterval)


doing this ourselves skips testing the BB logic which does it. Don't we want to test that too?

Can we let the BB run until it has filled the bucket with some data (i.e. we expect 3 blocks; do something like Eventually(func() {bucket.countBlocks() == 3}) or until it updates some metrics. Then we can assert on what each block contains with tsdb.OpenBlock() instead of tsdb.Open

Thoughts?

I didn't get the point, sorry. Here, what we do is, effectively, move the wall clock forward. It's not that we "skip" any of the block-builder's logic. We only point it to a portion of the partition, where the cycle's data represent what the test case tests (note that we explicitly trigger the nextConsumeCycle in these tests).

We would get the same if we started the block-builder and let it run for several hours. Over the course of multiple cycle hours, it'd scanned over all test data in the partition, and tested the blocks produced on each cycle. Without mocking block-builder's clock, that's not ideal, of course.

We would get the same if we started the block-builder and let it run for several hours

yes, but we'd exercise the production code logic. Right now the logic for calculating cycleEnd is in both the tests and the prod code.

but your point about waiting for hours also makes sense. Maybe smaller blocks can solve this? like 2-3-second long blocks with a much smaller ConsumeInterval? A small test is better than no test, so i don't want to block on this

pracucci

Nice test, I like it! 👏 I left a couple of minor comments. No need for me to re-review it.

pracucci · 2024-10-09T09:52:49Z

pkg/blockbuilder/blockbuilder_test.go

+	require.Eventually(t, func() bool {
+		return assert.NoError(t, promtest.GatherAndCompare(reg, strings.NewReader(`


Does this work as expected? If the assert.NoError() fail at least once, isn't it tracked anyway as a failure by the testing library? I'm wondering if that you really want is require.EventuallyWithT() which was designed for this specific use case.

pracucci · 2024-10-09T09:55:57Z

pkg/blockbuilder/blockbuilder_test.go

+		return assert.NoError(t, promtest.GatherAndCompare(reg, strings.NewReader(`
+			# HELP cortex_blockbuilder_consumer_lag_records The per-topic-partition number of records, instance needs to work through each cycle.
+			# TYPE cortex_blockbuilder_consumer_lag_records gauge
+			cortex_blockbuilder_consumer_lag_records{partition="0"} 0


We assert on cortex_blockbuilder_consumer_lag_records being 0 as a success condition to consider the block builder has done. Typically metrics are initialised by 0, so 0 could also mean "no cycle has started yet". It's not a real issue here because of a technicality: this metric is defined as prometheus.GaugeVec and they're not initialized with 0 by default (prometheus.Gauge is).

I'm wondering if when we start the block builder we should initialise the cortex_blockbuilder_consumer_lag_records metric for all owned partition to the value of -1 to clearly signal block building hasn't started yet.

pracucci · 2024-10-09T10:07:34Z

pkg/blockbuilder/blockbuilder_test.go

@@ -287,6 +280,121 @@ func TestBlockBuilder_WithMultipleTenants(t *testing.T) {
 	}
 }

+func TestBlockBuilder_WithOutOfOrderRecordsAndSamples(t *testing.T) {


[nit] This test assumes ConsumeInterval is 1h, which is the default, but it may change in the future. To keep this test stable, I would suggest to override ConsumeInterval to 1h in the code, and add a comment to explain that's an assumption of the test, regardless what the default value will be in the future.

Should we do the same for ConsumeIntervalBuffer?

narqo force-pushed the vldmr/bb-upstream-tests branch from 748286a to 19833e8 Compare September 26, 2024 20:50

narqo changed the title ~~blockbuilder: test out-of-order samples and records~~ blockbuilder: more tests Sep 26, 2024

narqo added 2 commits September 27, 2024 11:37

blockbuilder: test out-of-order samples and records

aaf92b4

Signed-off-by: Vladimir Varankin <[email protected]>

increase eventual timeout in test

b0c8533

Signed-off-by: Vladimir Varankin <[email protected]>

narqo force-pushed the vldmr/bb-upstream-tests branch from 19833e8 to b0c8533 Compare September 27, 2024 09:38

narqo marked this pull request as ready for review September 27, 2024 09:40

narqo requested a review from a team as a code owner September 27, 2024 09:40

narqo requested review from codesome and dimitarvdimitrov September 27, 2024 09:40

narqo commented Sep 27, 2024

View reviewed changes

dimitarvdimitrov reviewed Sep 27, 2024

View reviewed changes

pracucci approved these changes Oct 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blockbuilder: more tests #9314

blockbuilder: more tests #9314

narqo commented Sep 17, 2024 •

edited

Loading

narqo Sep 27, 2024

dimitarvdimitrov Sep 27, 2024

narqo Sep 27, 2024 •

edited

Loading

dimitarvdimitrov Oct 10, 2024

pracucci left a comment

pracucci Oct 9, 2024

pracucci Oct 9, 2024

pracucci Oct 9, 2024

		require.Eventually(t, func() bool {
		return assert.NoError(t, promtest.GatherAndCompare(reg, strings.NewReader(`

blockbuilder: more tests #9314

Are you sure you want to change the base?

blockbuilder: more tests #9314

Conversation

narqo commented Sep 17, 2024 • edited Loading

What this PR does

Checklist

narqo Sep 27, 2024

Choose a reason for hiding this comment

dimitarvdimitrov Sep 27, 2024

Choose a reason for hiding this comment

narqo Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

dimitarvdimitrov Oct 10, 2024

Choose a reason for hiding this comment

pracucci left a comment

Choose a reason for hiding this comment

pracucci Oct 9, 2024

Choose a reason for hiding this comment

pracucci Oct 9, 2024

Choose a reason for hiding this comment

pracucci Oct 9, 2024

Choose a reason for hiding this comment

narqo commented Sep 17, 2024 •

edited

Loading

narqo Sep 27, 2024 •

edited

Loading