Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query with resource_string_host$$name filter on main table doesn't skip data effectively #6500

Open
srikanthccv opened this issue Nov 21, 2024 · 1 comment

Comments

@srikanthccv
Copy link
Member

In most cases a useful skip index requires a strong correlation between the primary key and the targeted, non-primary column/expression. If there is no correlation (as in the above diagram), the chances of the filtering condition being met by at least one of the rows in the block of several thousand values is high and few blocks will be skipped. In constrast, if a range of values for the primary key (like time of day) is strongly associated with the values in the potential index column (such as television viewer ages), then a minmax type of index is likely to be beneficial. Note that it may be possible to increase this correlation when inserting data, either by including additional columns in the sorting/ORDER BY key, or batching inserts in a way that values associated with the primary key are grouped on insert. For example, all of the events for a particular site_id could be grouped and inserted together by the ingest process, even if the primary key is a timestamp containing events from a large number of sites. This will result in many granules that contains only a few site ids, so many blocks could be skipped when searching by a specific site_id value.

Since the resource_fingerprint has a strong direct correlation with the resource_string_host$$name, I would expect it to be very effective in skipping granules without having to write a subquery. Are we doing something incorrectly/ineffectively? Otherwise, how do we explain inefficient data skip indexes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant