-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for filters in Vector and VectorCypher retrievers #29
Conversation
# Conflicts: # src/neo4j_genai/retrievers/hybrid.py # src/neo4j_genai/retrievers/vector.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, so good to see this getting in.
I've got some comments initially, and then I'll test it out locally as well.
src/neo4j_genai/filters.py
Outdated
self.node_alias = node_alias | ||
|
||
def lhs(self, field): | ||
return f"{self.node_alias}.`{field}`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Prefix any field name backtick with a backtick
- If you find non alphabetic characters in the field name, wrap the whole field name with backticks
name -> name
1name -> `1name`
na`me -> `na``me`
Example of valid Cypher:
MATCH (n:`Hell``o`) WHERE n.`prop``erty` = true RETURN n LIMIT 25
More info: https://neo4j.com/docs/cypher-manual/current/syntax/naming/#symbolic-names-escaping-rules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, thanks! Wdyt about always wrapping the field name in backticks, whether it contains special characters or not? In your example it means that
name -> `name`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather see that we don't do that. It makes the queries more noisy when debugging and teaches bad practices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your point, and at the same time I'm a bit scared that we miss some special characters and produce invalid query "just" to make it nice. Let me check if we can find something really robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here, if you think the test cases are covering enough use cases: 2b89aff#diff-00b9f6996b286b97ed1b966f8596505071f2876cac256d97e31bb0de4449b198R53-R69
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I see your concern. But if anyone should be able to do the right thing it should be us :)
This should give us confidence: https://github.com/neo4j/cypher-builder/blob/main/src/utils/escape.ts#L54-L63
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, used the same regex to test the field validity.
# Conflicts: # src/neo4j_genai/retrievers/vector.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! On thread-safety, we concluded that the usage ofParameterStore
is fine as it is instantiated in get_metadata_filter()
. We have no plans on using multithreading inside get_metadata_filter()
so this shouldn't block this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome 🌊
Add support for pre-filtering for the Vector and VectorCypher retrievers