Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLAlchemy: Fix SQL statement caching for CrateDB's OBJECT type #559

Merged
merged 1 commit into from
Jul 5, 2023

Conversation

amotl
Copy link
Member

@amotl amotl commented Jul 4, 2023

About

The SQLAlchemy implementation of CrateDB's OBJECT type offers indexed access to the instance's content in form of a dictionary. Thus, it must not use cache_ok = True on its implementation, i.e. to signal this part of the compiled SQL clause must not be cached.

Problem

Currently, when using those statements subsequently, based on a corresponding ORM schema definition, the second one will yield the wrong results, because it will use the compiled SQL statement of the first one, as it has been cached wrongly. This flaw has been reported by @faymarie, thank you very much.

class Character(Base):
    __tablename__ = 'characters'
    name = sa.Column(sa.String, primary_key=True)
    age = sa.Column(sa.Integer)
    data = sa.Column(Object)
sa.select(Character).where(Character.data['x'] == 1)
sa.select(Character).where(Character.data['y'] == 2)

Details

The accompanying test case verifies that two subsequent SELECT statements are translated well, and don't trip on incorrect SQL compiled statement caching. Because I have not been able to catch that error by unit tests based on mocking, this test case has now been introduced as a first integration test to the SQLAlchemy dialect subsystem.

@amotl amotl force-pushed the amo/fix-object-caching branch 3 times, most recently from 7c1dec5 to cbb1053 Compare July 4, 2023 08:27
Comment on lines 134 to +135
class _Craty(sqltypes.UserDefinedType):
cache_ok = True
cache_ok = False
Copy link
Member Author

@amotl amotl Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effective change is this one. The other code is for testing only.

-- https://docs.sqlalchemy.org/en/20/core/type_api.html#sqlalchemy.types.ExternalType.cache_ok

Copy link
Member Author

@amotl amotl Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to eventually re-enable SQL statement caching in this context, it may be advisable to look at SQLAlchemy's implementation for PostgreSQL's JSONB type, and its contains() method, or at the corresponding implementation for MSSQL's JSON type and the Comparator.as_json() accessor it is using for that purpose.

-- https://stackoverflow.com/a/39467149

Copy link
Member Author

@amotl amotl Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played a bit with the _Craty implementation for OBJECT types within the crate.client.sqlalchemy.types module, trying to make this Python statement based on Comparator.as_integer() work, which I believe would be correct in that sense, and where I would hope the SQLAlchemy machinery would handle it well, without getting the statement caching wrong.

selectable = sa.select(Character).where(Character.data['x'].as_integer() == 1)

However, I got different exceptions, so I think it will need further investigations to make this work. In that manner, I suggest to postpone it to a later iteration, and release the bugfix without further ado.

Copy link
Member Author

@amotl amotl Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GH-560 will track what has been suggested above.

@amotl amotl requested review from mfussenegger and matriv July 4, 2023 08:56
@amotl amotl marked this pull request as ready for review July 4, 2023 08:56
@amotl amotl requested review from seut and matriv and removed request for matriv July 4, 2023 11:11
"""
The SQLAlchemy implementation of CrateDB's `ARRAY` type in form of the
`ObjectArray`, does *not* offer indexed access to the instance's content.
Thus, using `cache_ok = True` on that type should be sane, and not mess
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but now we disable cache for both cases, no? and it's not configurable by the user?

Copy link
Member Author

@amotl amotl Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching of compiled statements is only turned off for OBJECT types, implemented on behalf of _Craty. See #559 (review).

It is not configurable by the user, unless she would be monkeypatching it. It is a good thing, because otherwise, it will return wrong results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if I'm still confused, but Character is still a custom type, I don't get when this cache is used and when not.

Copy link
Member Author

@amotl amotl Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, and thanks for asking. Maybe this short introduction to SQLAlchemy's SQL compilation caching helps to understand the problem scope.

SQLAlchemy as of version 1.4 includes a SQL compilation caching facility which will allow Core and ORM SQL constructs to cache their stringified form, along with other structural information used to fetch results from the statement, allowing the relatively expensive string compilation process to be skipped when another structurally equivalent construct is next used. This system relies upon functionality that is implemented for all SQL constructs, including objects such as Column, select(), and TypeEngine objects, to produce a cache key which fully represents their state to the degree that it affects the SQL compilation process.

-- https://docs.sqlalchemy.org/en/20/faq/performance.html ff.

Copy link
Member Author

@amotl amotl Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, accessing the OBJECT implementation _Craty using dict-like indexing breaks that machinery.

sa.select(Character).where(Character.data['x'] == 1)
sa.select(Character).where(Character.data['y'] == 2)

Second statement uses cached SQL statement from first one, and things go south. As far as I've investigated, it is only happening on this occasion, and not with other data types which do not offer indexed access.

Copy link
Member

@seut seut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 but please squash as commented, thanks.


selectable = sa.select(Character).where(Character.data['y'] == 2)
result = self.session.execute(selectable).scalar_one().data
self.assertEqual({"y": 2}, result)
Copy link
Member

@seut seut Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this test then failing without the followup commit to change cache_ok to False? Such both commits should be squashed to avoid having failing tests inside our commit history. (e.g. using git bisect to identify some broken case would raise false positive failures)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Commits have been squashed now. Thanks.

The SQLAlchemy implementation of CrateDB's `OBJECT` type offers indexed
access to the instance's content in form of a dictionary. Thus, it must
not use `cache_ok = True` on its implementation, i.e. this part of the
compiled SQL clause must not be cached.

Tests: Add integration test cases verifying SA's SQL statement caching
Specifically, the special types `OBJECT` and `ARRAY` are of concern here.
@amotl amotl force-pushed the amo/fix-object-caching branch from cbb1053 to bb192b5 Compare July 5, 2023 14:48
Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx!

@amotl amotl merged commit 640a2db into master Jul 5, 2023
@amotl amotl deleted the amo/fix-object-caching branch July 5, 2023 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants