-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blockbuilder: more tests #9314
base: main
Are you sure you want to change the base?
blockbuilder: more tests #9314
Conversation
748286a
to
19833e8
Compare
Signed-off-by: Vladimir Varankin <[email protected]>
Signed-off-by: Vladimir Varankin <[email protected]>
19833e8
to
b0c8533
Compare
cortex_blockbuilder_consumer_lag_records{partition="0"} 0 | ||
cortex_blockbuilder_consumer_lag_records{partition="1"} 0 | ||
`), "cortex_blockbuilder_consumer_lag_records")) | ||
}, 30*time.Second, 100*time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: here (and other similar changes), it's fine to wait that long. These assertions are terminal, so the test should not process if the outcome doesn't "eventually" happen.
t.Run("future record", func(t *testing.T) { | ||
// The sample from above which was in-order but the kafka record was in future | ||
// should get consumed in this cycle. The other sample that is still in the future should not be consumed. | ||
cycleEnd = cycleEnd.Add(cfg.ConsumeInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doing this ourselves skips testing the BB logic which does it. Don't we want to test that too?
Can we let the BB run until it has filled the bucket with some data (i.e. we expect 3 blocks; do something like Eventually(func() {bucket.countBlocks() == 3})
or until it updates some metrics. Then we can assert on what each block contains with tsdb.OpenBlock()
instead of tsdb.Open
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get the point, sorry. Here, what we do is, effectively, move the wall clock forward. It's not that we "skip" any of the block-builder's logic. We only point it to a portion of the partition, where the cycle's data represent what the test case tests (note that we explicitly trigger the nextConsumeCycle
in these tests).
We would get the same if we started the block-builder and let it run for several hours. Over the course of multiple cycle hours, it'd scanned over all test data in the partition, and tested the blocks produced on each cycle. Without mocking block-builder's clock, that's not ideal, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would get the same if we started the block-builder and let it run for several hours
yes, but we'd exercise the production code logic. Right now the logic for calculating cycleEnd
is in both the tests and the prod code.
but your point about waiting for hours also makes sense. Maybe smaller blocks can solve this? like 2-3-second long blocks with a much smaller ConsumeInterval? A small test is better than no test, so i don't want to block on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test, I like it! 👏 I left a couple of minor comments. No need for me to re-review it.
require.Eventually(t, func() bool { | ||
return assert.NoError(t, promtest.GatherAndCompare(reg, strings.NewReader(` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work as expected? If the assert.NoError()
fail at least once, isn't it tracked anyway as a failure by the testing library? I'm wondering if that you really want is require.EventuallyWithT()
which was designed for this specific use case.
return assert.NoError(t, promtest.GatherAndCompare(reg, strings.NewReader(` | ||
# HELP cortex_blockbuilder_consumer_lag_records The per-topic-partition number of records, instance needs to work through each cycle. | ||
# TYPE cortex_blockbuilder_consumer_lag_records gauge | ||
cortex_blockbuilder_consumer_lag_records{partition="0"} 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We assert on cortex_blockbuilder_consumer_lag_records
being 0 as a success condition to consider the block builder has done. Typically metrics are initialised by 0, so 0 could also mean "no cycle has started yet". It's not a real issue here because of a technicality: this metric is defined as prometheus.GaugeVec
and they're not initialized with 0 by default (prometheus.Gauge
is).
I'm wondering if when we start the block builder we should initialise the cortex_blockbuilder_consumer_lag_records
metric for all owned partition to the value of -1
to clearly signal block building hasn't started yet.
@@ -287,6 +280,121 @@ func TestBlockBuilder_WithMultipleTenants(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestBlockBuilder_WithOutOfOrderRecordsAndSamples(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] This test assumes ConsumeInterval is 1h, which is the default, but it may change in the future. To keep this test stable, I would suggest to override ConsumeInterval to 1h in the code, and add a comment to explain that's an assumption of the test, regardless what the default value will be in the future.
Should we do the same for ConsumeIntervalBuffer?
What this PR does
This one seats atop #9199 for nowThis is part of #8635; refer to it for more details.
Here we backport the test cases for how block-builder handles the out-of-order samples.
Also, the PR fixes a flaky
TestBlockBuilder_StartWithLookbackOnNoCommit
test, by making sure the test waits for the correct outcome.Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.