sharding operators: topk / bottomk #7582

wanghaao · 2024-03-11T12:07:11Z

What this PR does

Sharding topk/bottomk operators

CLAassistant · 2024-03-11T12:07:16Z

All committers have signed the CLA.

duricanikolic · 2024-03-12T09:58:16Z

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

wanghaao · 2024-03-12T10:07:49Z

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

Done.

wanghaao · 2024-03-15T07:35:29Z

hello?

ying-jeanne · 2024-07-09T12:03:23Z

hello?

Hi @wanghaao, thanks for your contribution. This is a feature we had an internally discussion earlier, but it is not as straight forward as just shard the topk. Here is why originally from @pracucci internally, I would copy it here.

At first those operations seems shardable since each shard would returns the top/bottomk. We could then compute the top/bottomk of all shards.

It's not that easy. The topk (or bottomk) of all series is not necessarily the topk(concat(topk(shard 1), topk(shard2), ...)).

For example, let's consider the following data points:

foo{name=”a”,shard=”1”} 10
foo{name=”b”,shard=”1”} 9
foo{name=”e”,shard=”1”} 8
foo{name=”c”,shard=”2”} 9
foo{name=”d”,shard=”2”} 8
foo{name=”e”,shard=”2”} 8
Let's assume we run the query topk(2, sum by (name) (foo)). The parallel version of it is:

topk(2, sum by (name) (foo{shard=”1”})) -> topk(2, [a:10,b:9,e:8]) = [a:10,b:9]
topk(2, sum by (name) (foo{shard=”2”})) -> topk(2, [c:9,d:8,e:8]) = [c:10,d:9]

topk(2, concat(shard1, shard2)) -> topk(2, [a:10,b:9,c:10,d:9]) = [a:10,c:10]
The result is incorrect. We would have expected [e:16,a:10].

We should investigate if topk(k * shards) for the per-shard query guarantees correct results.

So in order to have the topk after concat, we should have K + B (a buffer) instead of K in the subquery, but I don't have a good mathematic equation to get that B on top of my head, since it would related to how even the data are distributed. That being said we won't accept this PR, but if you figure out a nice way to solve ⬆️ , feel free to comment here and update the PR.

colega · 2024-10-16T14:09:27Z

So in order to have the topk after concat, we should have K + B (a buffer) instead of K in the subquery

There's no buffer that could solve if there's a sum aggregation inside of topk.

topk and bottomk can be sharded if its expression is a plain vector selector.

My intuition tells me that topk(n, max by (...) (...)) should also work, and bottomk(n, min by (...) (...)).

wanghaao requested a review from a team as a code owner March 11, 2024 12:07

wanghaao force-pushed the main branch 2 times, most recently from 7ed0ff5 to 8efab08 Compare March 11, 2024 12:17

sharding operators: topk and bottomk

0cec1b8

wanghaao force-pushed the main branch from 8efab08 to 0cec1b8 Compare March 11, 2024 13:53

duricanikolic added the release/notified-changelog-cut label Mar 12, 2024

wanghaao closed this Mar 12, 2024

wanghaao force-pushed the main branch from 0cec1b8 to afeffa7 Compare March 12, 2024 10:03

Merge branch 'main' of github.com:wanghaao/mimir

508fffc

wanghaao reopened this Mar 12, 2024

56quarters closed this Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sharding operators: topk / bottomk #7582

sharding operators: topk / bottomk #7582

wanghaao commented Mar 11, 2024 •

edited

Loading

CLAassistant commented Mar 11, 2024 •

edited

Loading

duricanikolic commented Mar 12, 2024

wanghaao commented Mar 12, 2024

wanghaao commented Mar 15, 2024

ying-jeanne commented Jul 9, 2024 •

edited

Loading

colega commented Oct 16, 2024

sharding operators: topk / bottomk #7582

sharding operators: topk / bottomk #7582

Conversation

wanghaao commented Mar 11, 2024 • edited Loading

What this PR does

CLAassistant commented Mar 11, 2024 • edited Loading

duricanikolic commented Mar 12, 2024

wanghaao commented Mar 12, 2024

wanghaao commented Mar 15, 2024

ying-jeanne commented Jul 9, 2024 • edited Loading

colega commented Oct 16, 2024

wanghaao commented Mar 11, 2024 •

edited

Loading

CLAassistant commented Mar 11, 2024 •

edited

Loading

ying-jeanne commented Jul 9, 2024 •

edited

Loading