You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In most cases a useful skip index requires a strong correlation between the primary key and the targeted, non-primary column/expression. If there is no correlation (as in the above diagram), the chances of the filtering condition being met by at least one of the rows in the block of several thousand values is high and few blocks will be skipped. In constrast, if a range of values for the primary key (like time of day) is strongly associated with the values in the potential index column (such as television viewer ages), then a minmax type of index is likely to be beneficial. Note that it may be possible to increase this correlation when inserting data, either by including additional columns in the sorting/ORDER BY key, or batching inserts in a way that values associated with the primary key are grouped on insert. For example, all of the events for a particular site_id could be grouped and inserted together by the ingest process, even if the primary key is a timestamp containing events from a large number of sites. This will result in many granules that contains only a few site ids, so many blocks could be skipped when searching by a specific site_id value.
Since the resource_fingerprint has a strong direct correlation with the resource_string_host$$name, I would expect it to be very effective in skipping granules without having to write a subquery. Are we doing something incorrectly/ineffectively? Otherwise, how do we explain inefficient data skip indexes?
The text was updated successfully, but these errors were encountered:
Since the resource_fingerprint has a strong direct correlation with the
resource_string_host$$name
, I would expect it to be very effective in skipping granules without having to write a subquery. Are we doing something incorrectly/ineffectively? Otherwise, how do we explain inefficient data skip indexes?The text was updated successfully, but these errors were encountered: