Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QueryBuilder: Implementation of contains Filter Operator applied to JSON Fields for SQLite backend #6619

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

rabbull
Copy link
Contributor

@rabbull rabbull commented Nov 19, 2024

This PR introduces the contains filter operator for the SQLite backend, implemented using custom functions rather than SQLAlchemy's high-level abstractions. Comprehensive tests have been included to cover all potential use cases for contains on the SQLite backend, providing a detailed demonstration of its behavior across various scenarios.

The primary goal of this PR is to replicate the functionality of the contains operator from PostgreSQL on SQLite. However, due to the complex and sometimes unexpected behavior of PostgreSQL's containment operator (#6618), a pure SQL implementation would be highly complex and difficult to maintain. To address this, custom functions were chosen, leveraging the flexibility of high-level programming languages (Python, in this case) to handle complex control flows more effectively.

While invoking functions from other languages introduces potential performance limitations, this trade-off is acceptable compared to the alternative of lacking this functionality entirely. Benchmarks will be added to assess the performance impact and ensure it remains within acceptable bounds.

This implementation serves as a proof of concept to evaluate the feasibility of using custom functions for the contains operator. While functional, the current code structure is still preliminary and requires significant cleanup. If this approach proves successful, a subsequent refactor will address technical debt and improve the maintainability of the codebase. This PR is marked as WIP while the approach is evaluated, benchmarks are added, and the necessary refactor is performed.

Closes #6494

@rabbull rabbull changed the title [WIP] ORM: Implementation of contains Filter Operator applied to JSON Fields for SQLite backend [WIP] QueryBuilder: Implementation of contains Filter Operator applied to JSON Fields for SQLite backend Nov 19, 2024
Copy link

codecov bot commented Nov 20, 2024

Codecov Report

Attention: Patch coverage is 41.17647% with 20 lines in your changes missing coverage. Please review.

Project coverage is 77.80%. Comparing base (ef60b66) to head (ffa0b11).
Report is 142 commits behind head on main.

Files with missing lines Patch % Lines
src/aiida/storage/sqlite_zip/utils.py 20.84% 19 Missing ⚠️
src/aiida/storage/sqlite_zip/orm.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6619      +/-   ##
==========================================
+ Coverage   77.51%   77.80%   +0.30%     
==========================================
  Files         560      567       +7     
  Lines       41444    42177     +733     
==========================================
+ Hits        32120    32811     +691     
- Misses       9324     9366      +42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@GeigerJ2 GeigerJ2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's maybe shelve this for now, until we have resolved contains for PSQL? Also, just out of curiosity, any reason for going through the custom implementation, rather than through SQLA, apart from the weird behavior for PSQL?

If we agree that the behavior for PSQL is fine, I'd vouch that we also go the SQLA route here. We just need to make sure the behavior is consistent for both backends, as well as document it somewhere.

@@ -48,12 +48,43 @@ def sqlite_case_sensitive_like(dbapi_connection, _):
cursor.close()


def _contains(lhs: Union[dict, list], rhs: Union[dict, list]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe annotate for iterable in general, not just list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using iterable might be too broad. A JSON array is more like a list of finite length, but a iterable may be infinite, which doesn't express the attributes of JSON arrays percisely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thank you for explaining the motivation!

@rabbull
Copy link
Contributor Author

rabbull commented Nov 22, 2024

any reason for going through the custom implementation, rather than through SQLA, apart from the weird behavior for PSQL?

The main reason for opting for a custom implementation over SQLA is it's too hard to implement such an operator in SQL, as well as in SQLA, due to the insufficient support of recursive operations and effective branching in SQL. Even if a viable solution could be carried out, it is most likely to be unreadable and unmaintainable.

@rabbull
Copy link
Contributor Author

rabbull commented Nov 26, 2024

Hi @GeigerJ2,

I’ve updated the tests for this feature by combining all tests for JSON filters dedicated on the SQLite backend into those on the PostgreSQL backend, and removed requires_psql marks for related test cases. This migration also helped uncover and fix a few bugs in the implementation of JSON filters for the PostgreSQL backend.

As for the placement of the custom function _json_contains, I believe its current position is acceptable for now. Since everything related to sqlite_dos is currently located in the sqlite_zip package, which doesn't seem a proper structure for me and is planned to be refactored in the near future (#6607), a partial refactor doesn't look worthwhile.

Please kindly review the new modification when you are available.

@rabbull rabbull changed the title [WIP] QueryBuilder: Implementation of contains Filter Operator applied to JSON Fields for SQLite backend QueryBuilder: Implementation of contains Filter Operator applied to JSON Fields for SQLite backend Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QueryBuilder: 'contains' operator not implemented for SQLite storage backends
2 participants