Query with `resource_string_host$$name` filter on main table doesn't skip data effectively #6500

srikanthccv · 2024-11-21T13:54:15Z

In most cases a useful skip index requires a strong correlation between the primary key and the targeted, non-primary column/expression. If there is no correlation (as in the above diagram), the chances of the filtering condition being met by at least one of the rows in the block of several thousand values is high and few blocks will be skipped. In constrast, if a range of values for the primary key (like time of day) is strongly associated with the values in the potential index column (such as television viewer ages), then a minmax type of index is likely to be beneficial. Note that it may be possible to increase this correlation when inserting data, either by including additional columns in the sorting/ORDER BY key, or batching inserts in a way that values associated with the primary key are grouped on insert. For example, all of the events for a particular site_id could be grouped and inserted together by the ingest process, even if the primary key is a timestamp containing events from a large number of sites. This will result in many granules that contains only a few site ids, so many blocks could be skipped when searching by a specific site_id value.

Since the resource_fingerprint has a strong direct correlation with the resource_string_host$$name, I would expect it to be very effective in skipping granules without having to write a subquery. Are we doing something incorrectly/ineffectively? Otherwise, how do we explain inefficient data skip indexes?

The text was updated successfully, but these errors were encountered:

srikanthccv · 2024-11-21T13:55:03Z

Excerpt from https://clickhouse.com/docs/en/optimize/skipping-indexes#skip-best-practices

srikanthccv mentioned this issue Nov 22, 2024

Review and update secondary indexes for materialized columns #6503

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query with `resource_string_host$$name` filter on main table doesn't skip data effectively #6500

Query with `resource_string_host$$name` filter on main table doesn't skip data effectively #6500

srikanthccv commented Nov 21, 2024

srikanthccv commented Nov 21, 2024

Query with resource_string_host$$name filter on main table doesn't skip data effectively #6500

Query with resource_string_host$$name filter on main table doesn't skip data effectively #6500

Comments

srikanthccv commented Nov 21, 2024

srikanthccv commented Nov 21, 2024

Query with `resource_string_host$$name` filter on main table doesn't skip data effectively #6500

Query with `resource_string_host$$name` filter on main table doesn't skip data effectively #6500