You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I'm wanting to perform ANN search on time-series data, so I'm trying to index my tables on multiple columns: the embedding column and the timestamp column, in order to optimally take advantage of timescale hypertable functionality. I'm not able to find any documentation on how to do this.
e.g. I would like something like
CREATE INDEX my_index ON my_hypertable USING diskann (timestamp, embedding);
It seems like pgvector supports conditional indexing only (e.g. CREATE INDEX ON items USING hnsw (embedding vector_l2_ops) WHERE (category_id = 123);) but for obvious reasons this isn't available for time-based partitions.
It would be a major advantage for us to be able to query on long-term timeseries data, so we'd love to see this added if it's not already available. If it isn't, is this functionality possible or on the roadmap as an enhancement at some point?
The text was updated successfully, but these errors were encountered:
@hamishc Vector indexes cannot be multi-column right now. What you want to do instead is use time-based table partitioning using Timescale's hypertables and then have a regular diskann column on the embedding column. That way the query executions will be approximately as follows:
the query planner will exclude any chunks (partitions) that cannot have any data based on the time-based constraints in your query
for each chunk that matched, the index on that chunk will get the rows with the closest vectors
the executor will then filter out any rows that don't match the time filter
Step 1 makes sure most of the irrelevant data based on the time constraints are thrown away quickly. Step 2 uses the full power of the vector index. Step 3 does the final cleanup.
Oh, so hypertables don't actually need indexes on the time column in order to use the partitions? When I created the hypertable I ran it with create_default_indexes => FALSE - so I assumed any indexing had to be on both the desired column and the time column (this is what the timescale docs seem to suggest).
I've validated with the query planner that it's using the indexes and only running on the requested partition, so it's working either way! Thanks for your help!
Hi! I'm wanting to perform ANN search on time-series data, so I'm trying to index my tables on multiple columns: the embedding column and the timestamp column, in order to optimally take advantage of timescale hypertable functionality. I'm not able to find any documentation on how to do this.
e.g. I would like something like
It seems like pgvector supports conditional indexing only (e.g.
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops) WHERE (category_id = 123);
) but for obvious reasons this isn't available for time-based partitions.It would be a major advantage for us to be able to query on long-term timeseries data, so we'd love to see this added if it's not already available. If it isn't, is this functionality possible or on the roadmap as an enhancement at some point?
The text was updated successfully, but these errors were encountered: