-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revised VSS RFC #15
base: main
Are you sure you want to change the base?
Revised VSS RFC #15
Conversation
Signed-off-by: Allen Samuels <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other open questions that would document for completeness:
- Will ACLs be supported?
- Is active defragmentation supported?
- Is the RDB compatible with redis search.
- I assume this a subset of redis search functionality. but would be good to document that along with what is missing.
- How will slot migration work, if at all. Will we also copy indexes between nodes when using the cli?
- Are any core changes required? We talked about memory sharing and some new keysspace notifications.
- How does redirecting read requests to the primary work in cluster mode? Can you force strongly consistent reads? By the consistency model do we care?
- Are we allowing searcg in Lua and multi-exec? Will it cause latency issues? Does redis allow it?
VSS.md
Outdated
|
||
The command returns either an array if successful or an error. | ||
|
||
If `NOCONTENT` is specified, then the output is ..... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a mystery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested changes:
On success, the first entry in the response array represents the total number of qualified matching elements, followed by one array entry for each matching element. Note that the amount of response entries may differ from the total number of response elements, captured in the response first entry, just if the LIMIT
option is specified.
When NOCONTENT
is specified, each entry in the response contains only the matching keys. Otherwise, each entry includes the matching key, followed by an array of the returned fields.
VSS.md
Outdated
|
||
If `NOCONTENT` is specified, then the output is ..... | ||
|
||
If `NOCONTENT` is not specified, then the output is an array of (2\*N)+1 entries, where N is the number of keys output from the search. The first entry in the array is the value N which is followed by N pairs of entries, one per key found. Each pair of entries consists of the key name followed by an array which is the result value for that key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The protocol has n built in, why is it returned as the first argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, see my suggested changes above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compatibility with Redisearch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we have no idea why RedisSearch added it?
|
||
1\. **reader-threads:** (Integer) Controls the amount of threads executing queries. | ||
2\. **writer-threads:** (Integer) Controls the amount of threads processing index mutations. | ||
3\. **use-coordinator:** (boolean) Cluster mode enabler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't any discussion of the coordinator here, is that in scope for the initial version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's part of the initial version and it's the way to enable cluster mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. will add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followup, is there a reason to disable the coordinator for cluster mode? Should it just automatically get enabled for cluster mode?
VSS.md
Outdated
- **search\_total\_indexed\_hash\_keys** (Integer) Total count of HASH keys for all indexes | ||
- **search\_number\_of\_indexes** (Integer) Index schema total count | ||
- **search\_number\_of\_attributes** (Integer) Total count of attributes for all indexes | ||
- **search\_failure\_requests\_count** (Integer) A count of all failed requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the definition of a failed request? One that errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This captures any ft.search runtime errors including invalid syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think typically those are called errors throughout the codebase.
- **search\_add\_subscription\_successful\_count** (Integer) Count of successfully added subscriptions | ||
- **search\_add\_subscription\_failure\_count** (Integer) Count of failures of adding subscriptions | ||
- **search\_add\_subscription\_skipped\_count** (Integer) Count of skipped subscription adding processes | ||
- **search\_modify\_subscription\_failure\_count** (Integer) Count of failed subscription modifications |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just making one comment, I don't really understand what most of these info fields are supposed to tell end users. Maybe document them like it's for our public documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure what these subscriptions are, they are only mentioned here.
VSS.md
Outdated
|
||
The command returns either an array if successful or an error. | ||
|
||
If `NOCONTENT` is specified, then the output is ..... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested changes:
On success, the first entry in the response array represents the total number of qualified matching elements, followed by one array entry for each matching element. Note that the amount of response entries may differ from the total number of response elements, captured in the response first entry, just if the LIMIT
option is specified.
When NOCONTENT
is specified, each entry in the response contains only the matching keys. Otherwise, each entry includes the matching key, followed by an array of the returned fields.
|
||
- **\<index\>** (required): This index name you want to query. | ||
- **\<query\>** (required): The query string, see below for details. | ||
- **NOCONTENT** (optional): When present, only the resulting key names are returned, no key values are included. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing:
RETURN (optional): Specifies the fields you want to retrieve from your documents, along with any aliases for the returned values. By default, all fields are returned unless the NOCONTENT option is set, in which case no fields are returned. If num is set to 0, it behaves the same as NOCONTENT.
VSS.md
Outdated
|
||
If `NOCONTENT` is specified, then the output is ..... | ||
|
||
If `NOCONTENT` is not specified, then the output is an array of (2\*N)+1 entries, where N is the number of keys output from the search. The first entry in the array is the value N which is followed by N pairs of entries, one per key found. Each pair of entries consists of the key name followed by an array which is the result value for that key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, see my suggested changes above.
|
||
1\. **reader-threads:** (Integer) Controls the amount of threads executing queries. | ||
2\. **writer-threads:** (Integer) Controls the amount of threads processing index mutations. | ||
3\. **use-coordinator:** (boolean) Cluster mode enabler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's part of the initial version and it's the way to enable cluster mode.
VSS.md
Outdated
- **search\_total\_indexed\_hash\_keys** (Integer) Total count of HASH keys for all indexes | ||
- **search\_number\_of\_indexes** (Integer) Index schema total count | ||
- **search\_number\_of\_attributes** (Integer) Total count of attributes for all indexes | ||
- **search\_failure\_requests\_count** (Integer) A count of all failed requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This captures any ft.search runtime errors including invalid syntax.
ACLs are not supported in the current version.
Memory allocated by the module is not subject to active defragmentation.
No, the RDB format is not compatible with RediSearch. The module utilizes a proprietary RDB format based on protobuf, which serializes both index metadata and content. In contrast, RediSearch's RDB format only serializes the index metadata.
Yes, we should have a section which enumerates key missing functionality as bullet points to provide clarity.
Slot migration is supported. Indexes are cluster level concepts so they are already replicated across the cluster, and not contained within a slot. Keys from the migrated slot trigger keyspace mutation events, which are processed in the same manner as client-originated mutations, except that they do not block the client during execution.
While the module can function without engine changes, two engine changes have been introduced to enhance user experience:
Valkey doesn't support distributed transactions so I don't see any consistency guarantee with scatter-gather across multiple shards.
|
Made some inline editing to bullet #8 regarding to LUA. |
Adding gaps relative to RediSearch section Signed-off-by: yairgott <[email protected]>
Fixing heading of the 'Unsupported knobs and control' section Signed-off-by: yairgott <[email protected]>
**Search Operations CUJ** \- as a user, I want to perform search operations using commonly available clients. | ||
|
||
- Requirement: support index insertion/mutation/deletion/query operations | ||
- Requirement: maintain compatibility with RediSearch VSS APIs, Memorystore APIs, and MemoryDB APIs as much as possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RedisSearch API is weird. Pinecone/Milvus API is more natural
# create
pinecone.create_index(name=index_name, dimension=dimension)
# insert
index = pinecone.Index(index_name)
index.upsert(vectors)
# search
results = index.query(queries=[query_vector], top_k=5)
I believe developers prefer the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to clarify the difference:
- In the current implementation, you insert keys with other commands and then create an index ontop of a prefix structure.
- In the pinecone world, you create an index and then insert objects into the index explicitly.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the proposed implementation, once an index is created, it always reflects the current state of the keys that the index covers. This means that the order of key insertion vs index creation is arbitrary, i.e., you can do them in any order.
|
||
The following metrics are added to the INFO command. | ||
|
||
- **search\_total\_indexed\_hash\_keys** (Integer) Total count of HASH keys for all indexes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a corresponding JSON variant?
- **search\_hnsw\_create\_exceptions\_count** (Integer) Count of HNSW creation exceptions. | ||
- **search\_hnsw\_search\_exceptions\_count** (Integer) Count of HNSW search exceptions | ||
- **search\_hnsw\_remove\_exceptions\_count** (Integer) Count of HNSW removal exceptions. | ||
- **search\_hnsw\_add\_exceptions\_count** (Integer) Count of HNSW addition exceptions. | ||
- **search\_hnsw\_modify\_exceptions\_count** (Integer) Count of HNSW modification exceptions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still unclear, what are exceptions? How do these fails and what are end users supposed to be doing about these. Are these just syntax errors?
- **search\_add\_subscription\_successful\_count** (Integer) Count of successfully added subscriptions | ||
- **search\_add\_subscription\_failure\_count** (Integer) Count of failures of adding subscriptions | ||
- **search\_add\_subscription\_skipped\_count** (Integer) Count of skipped subscription adding processes | ||
- **search\_modify\_subscription\_failure\_count** (Integer) Count of failed subscription modifications |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure what these subscriptions are, they are only mentioned here.
|
||
1\. **reader-threads:** (Integer) Controls the amount of threads executing queries. | ||
2\. **writer-threads:** (Integer) Controls the amount of threads processing index mutations. | ||
3\. **use-coordinator:** (boolean) Cluster mode enabler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followup, is there a reason to disable the coordinator for cluster mode? Should it just automatically get enabled for cluster mode?
This means for that for data mutation operations, no cross-cluster communication is required, each node simply updates it's local index to reflect the mutation of the local key. | ||
|
||
Query operations are accepted by any node in the cluster and that node is responsible for broadcasting the query across the cluster and merging the results for delivery to the client. | ||
Cross-client communication uses gRPC and protobufs and does not require mainthread interaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this on a separate port? Does this need to be configurable so that end users can slot it into their system?
removal of links to redis.io Signed-off-by: yairgott <[email protected]>
Replaces and extends the previous PR for VSS module.