Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for filters in Vector and VectorCypher retrievers #29

Merged
merged 41 commits into from
May 14, 2024

Conversation

stellasia
Copy link
Contributor

@stellasia stellasia commented May 6, 2024

Add support for pre-filtering for the Vector and VectorCypher retrievers

@stellasia stellasia requested review from willtai and oskarhane May 6, 2024 16:23
@stellasia stellasia marked this pull request as ready for review May 7, 2024 14:29
Copy link
Member

@oskarhane oskarhane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, so good to see this getting in.
I've got some comments initially, and then I'll test it out locally as well.

src/neo4j_genai/filters.py Outdated Show resolved Hide resolved
self.node_alias = node_alias

def lhs(self, field):
return f"{self.node_alias}.`{field}`"
Copy link
Member

@oskarhane oskarhane May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Prefix any field name backtick with a backtick
  2. If you find non alphabetic characters in the field name, wrap the whole field name with backticks
name -> name
1name -> `1name`
na`me -> `na``me`

Example of valid Cypher:

MATCH (n:`Hell``o`) WHERE n.`prop``erty` = true RETURN n LIMIT 25

More info: https://neo4j.com/docs/cypher-manual/current/syntax/naming/#symbolic-names-escaping-rules

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks! Wdyt about always wrapping the field name in backticks, whether it contains special characters or not? In your example it means that

name -> `name`

Copy link
Member

@oskarhane oskarhane May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather see that we don't do that. It makes the queries more noisy when debugging and teaches bad practices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point, and at the same time I'm a bit scared that we miss some special characters and produce invalid query "just" to make it nice. Let me check if we can find something really robust.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here, if you think the test cases are covering enough use cases: 2b89aff#diff-00b9f6996b286b97ed1b966f8596505071f2876cac256d97e31bb0de4449b198R53-R69

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see your concern. But if anyone should be able to do the right thing it should be us :)
This should give us confidence: https://github.com/neo4j/cypher-builder/blob/main/src/utils/escape.ts#L54-L63

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, used the same regex to test the field validity.

src/neo4j_genai/retrievers/base.py Outdated Show resolved Hide resolved
src/neo4j_genai/filters.py Outdated Show resolved Hide resolved
src/neo4j_genai/filters.py Outdated Show resolved Hide resolved
Copy link
Contributor

@willtai willtai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! On thread-safety, we concluded that the usage ofParameterStore is fine as it is instantiated in get_metadata_filter(). We have no plans on using multithreading inside get_metadata_filter() so this shouldn't block this PR

Copy link
Member

@oskarhane oskarhane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome 🌊

@stellasia stellasia merged commit e30fa26 into main May 14, 2024
9 checks passed
@stellasia stellasia deleted the pre-filters branch May 14, 2024 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants