-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQLAlchemy: Fix SQL statement caching for CrateDB's OBJECT
type
#559
Conversation
7c1dec5
to
cbb1053
Compare
class _Craty(sqltypes.UserDefinedType): | ||
cache_ok = True | ||
cache_ok = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The effective change is this one. The other code is for testing only.
-- https://docs.sqlalchemy.org/en/20/core/type_api.html#sqlalchemy.types.ExternalType.cache_ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to eventually re-enable SQL statement caching in this context, it may be advisable to look at SQLAlchemy's implementation for PostgreSQL's JSONB type, and its contains()
method, or at the corresponding implementation for MSSQL's JSON type and the Comparator.as_json()
accessor it is using for that purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played a bit with the _Craty
implementation for OBJECT
types within the crate.client.sqlalchemy.types
module, trying to make this Python statement based on Comparator.as_integer()
work, which I believe would be correct in that sense, and where I would hope the SQLAlchemy machinery would handle it well, without getting the statement caching wrong.
selectable = sa.select(Character).where(Character.data['x'].as_integer() == 1)
However, I got different exceptions, so I think it will need further investigations to make this work. In that manner, I suggest to postpone it to a later iteration, and release the bugfix without further ado.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GH-560 will track what has been suggested above.
""" | ||
The SQLAlchemy implementation of CrateDB's `ARRAY` type in form of the | ||
`ObjectArray`, does *not* offer indexed access to the instance's content. | ||
Thus, using `cache_ok = True` on that type should be sane, and not mess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but now we disable cache for both cases, no? and it's not configurable by the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caching of compiled statements is only turned off for OBJECT
types, implemented on behalf of _Craty
. See #559 (review).
It is not configurable by the user, unless she would be monkeypatching it. It is a good thing, because otherwise, it will return wrong results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies if I'm still confused, but Character
is still a custom type, I don't get when this cache is used and when not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, and thanks for asking. Maybe this short introduction to SQLAlchemy's SQL compilation caching helps to understand the problem scope.
SQLAlchemy as of version 1.4 includes a SQL compilation caching facility which will allow Core and ORM SQL constructs to cache their stringified form, along with other structural information used to fetch results from the statement, allowing the relatively expensive string compilation process to be skipped when another structurally equivalent construct is next used. This system relies upon functionality that is implemented for all SQL constructs, including objects such as Column, select(), and TypeEngine objects, to produce a cache key which fully represents their state to the degree that it affects the SQL compilation process.
-- https://docs.sqlalchemy.org/en/20/faq/performance.html ff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, accessing the OBJECT
implementation _Craty
using dict-like indexing breaks that machinery.
sa.select(Character).where(Character.data['x'] == 1)
sa.select(Character).where(Character.data['y'] == 2)
Second statement uses cached SQL statement from first one, and things go south. As far as I've investigated, it is only happening on this occasion, and not with other data types which do not offer indexed access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 but please squash as commented, thanks.
|
||
selectable = sa.select(Character).where(Character.data['y'] == 2) | ||
result = self.session.execute(selectable).scalar_one().data | ||
self.assertEqual({"y": 2}, result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this test then failing without the followup commit to change cache_ok
to False
? Such both commits should be squashed to avoid having failing tests inside our commit history. (e.g. using git bisect
to identify some broken case would raise false positive failures)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Commits have been squashed now. Thanks.
The SQLAlchemy implementation of CrateDB's `OBJECT` type offers indexed access to the instance's content in form of a dictionary. Thus, it must not use `cache_ok = True` on its implementation, i.e. this part of the compiled SQL clause must not be cached. Tests: Add integration test cases verifying SA's SQL statement caching Specifically, the special types `OBJECT` and `ARRAY` are of concern here.
cbb1053
to
bb192b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx!
About
The SQLAlchemy implementation of CrateDB's
OBJECT
type offers indexed access to the instance's content in form of a dictionary. Thus, it must not usecache_ok = True
on its implementation, i.e. to signal this part of the compiled SQL clause must not be cached.Problem
Currently, when using those statements subsequently, based on a corresponding ORM schema definition, the second one will yield the wrong results, because it will use the compiled SQL statement of the first one, as it has been cached wrongly. This flaw has been reported by @faymarie, thank you very much.
Details
The accompanying test case verifies that two subsequent
SELECT
statements are translated well, and don't trip on incorrect SQL compiled statement caching. Because I have not been able to catch that error by unit tests based on mocking, this test case has now been introduced as a first integration test to the SQLAlchemy dialect subsystem.