SQLAlchemy 2.0 upgrades (part 5) #16932

jdavcs · 2023-10-26T15:54:25Z

Builds on #16852 (34 commits).
SQLAlchemy 2.0 compatibility upgrades. Ref #12541.

The main part of this PR is the _get_nested_collection_attributes method. It is dynamic and the structure of the result depends on the input. The method uses a Query object which must be replaced with a Select.

There are 2 main challenges here:

Unlike Query, which may return tuples of items or mapped objects, a Select always returns tuples (ref). The usual workaround is to call scalars(), which will simply return the first item from each tuple, turning [(model,), ...] into [model, ...]. However, this method, depending on input, may return tuples of items, tuples of items and mapped objects, and mapped objects. Therefore, the burden of restructuring the result has to be shifted to the caller.
The usage of DISTINCT requires that all columns appearing in the ORDER BY clause must also appear in the SELECT clause (see more details in commit description: c0f6b1d). We can easily add the missing columns to the select statement; however, that alters the structure of the returned values. Removing the added columns (without breaking some of the calls) is a surprisingly hard problem (the suggested approach doesn't work with a complex statement with joins and the metadata attribute which we handle differently, but which still appears in the select clause (I didn't find a way to do it using SA's public API). The alternative is to leave the added columns in the select clause - that works for the most part. The exception is the case when the columns are added dynamically and there is no way to determine how many of them will be added (depends on collection nesting): that, coupled with accessing items in the result by relative position (like here) poses the following challenge: with code like row[:-3] we intend to chop-off the last 3 elements, because we don't know how many precede them. However, with adding columns from the order by clause, we no longer know how many columns have been appended (added to the right of the list). Thus, we have a list that is dynamic "on both ends", so accessing items by relative position doesn't work. The workaround is the find_identifiers helper.
For the rest of the cases, we use a throwaway variable *_ when destructuring the result.

Of course, this reduces the number of SA's RemovedIn20 warnings.

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.
Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

I double checked the performance of "exists().where(criteria)" vs. "select(foo).where(criteria).limit(1)" with explain analyze. While the startup cost is 40% higher for exists, the total costs are identical. In terms of readability, "exists" is more succinct and straightforward.

This is a step towards converting the _get_nested_collection_attributes method from Query to Select. The add_entity method does not exist on Select; add_column should work the same way, in theory...

1. Upgrade Query to Select 2. Factor out query-building logic. The previous version returned tuples of items OR models (ORM objects), depending on the calling code (several similar data access methods were combined into this one generic method in PR galaxyproject#12056). The Query object would "magically" convert tuples of ORM objects to ORM objects. The new unified Select object does not do that. As as result, with Select, this method would return tuples of items or tuples of models (not models): result1 = session.execute(statement2) result1 == [("element_identifier_0", "element_identifier_1", "extension", "state"), ...] result2 = session.execute(statement2) result2 == [(dataset1,), (dataset2,) ...] Factoring out the query-building logic and having the caller execute it depending on the expected data structure solves this.

Required if using DISTINCT

lib/galaxy/model/__init__.py

jdavcs · 2023-11-29T14:55:18Z

Thanks for the review + suggestions, @mvdbeek!

jdavcs added kind/refactoring cleanup or refactoring of existing code, no functional changes area/database Galaxy's database or data access layer labels Oct 26, 2023

jdavcs added this to the 23.2 milestone Oct 26, 2023

jdavcs mentioned this pull request Oct 26, 2023

SQLAlchemy 2.0 compatibility #12541

Open

28 tasks

jdavcs force-pushed the dev_sa20_fix21 branch from 8c88538 to 3f5b1d0 Compare November 1, 2023 21:55

jdavcs mentioned this pull request Nov 10, 2023

SQLAlchemy 2.0 upgrades (part 4) #16852

Merged

4 tasks

jdavcs force-pushed the dev_sa20_fix21 branch 4 times, most recently from 9d3fab8 to fb46470 Compare November 16, 2023 18:43

jdavcs added 14 commits November 27, 2023 18:30

SA2.0: Do not bind MetaData: must use engine explicitly

fa5adb0

Use add_columns instead of add_entity

f5c93cd

This is a step towards converting the _get_nested_collection_attributes method from Query to Select. The add_entity method does not exist on Select; add_column should work the same way, in theory...

Fix dataset_action_tuples

0256f3c

Fix dataset_states_and_extensions_summary

762f119

Fix has_deferred_data

67d7318

Drop unused property

851334e

Fix element_identifiers_extensions_paths_and_metadata_files

8ece420

Fix dataset_instances

408f6e9

Fix dataset_elements

5b31248

Fix replace_failed_elements

007c83f

Rename local vars

b9c0176

Add columns from ORDER BY clause to SELECT clause

8361313

Required if using DISTINCT

jdavcs force-pushed the dev_sa20_fix21 branch 2 times, most recently from e5cbd79 to d695173 Compare November 27, 2023 23:33

Add unit test; adjust tests to account for order-by columns

f45e6d1

jdavcs force-pushed the dev_sa20_fix21 branch from d695173 to f45e6d1 Compare November 27, 2023 23:39

jdavcs marked this pull request as ready for review November 28, 2023 00:36

jdavcs requested a review from mvdbeek November 28, 2023 00:37

jdavcs requested a review from a team November 28, 2023 00:37

mvdbeek reviewed Nov 28, 2023

View reviewed changes

lib/galaxy/model/__init__.py Outdated Show resolved Hide resolved

mvdbeek reviewed Nov 28, 2023

View reviewed changes

lib/galaxy/model/__init__.py Outdated Show resolved Hide resolved

Use distinct only for dataset states/extensions summary

eeeaf7b

jdavcs force-pushed the dev_sa20_fix21 branch from 55e03da to eeeaf7b Compare November 28, 2023 20:19

mvdbeek approved these changes Nov 29, 2023

View reviewed changes

jdavcs merged commit fc0a266 into galaxyproject:dev Nov 29, 2023
48 of 49 checks passed

mvdbeek modified the milestones: 23.2, 24.0 Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQLAlchemy 2.0 upgrades (part 5) #16932

SQLAlchemy 2.0 upgrades (part 5) #16932

jdavcs commented Oct 26, 2023 •

edited

Loading

jdavcs commented Nov 29, 2023

SQLAlchemy 2.0 upgrades (part 5) #16932

SQLAlchemy 2.0 upgrades (part 5) #16932

Conversation

jdavcs commented Oct 26, 2023 • edited Loading

How to test the changes?

License

jdavcs commented Nov 29, 2023

jdavcs commented Oct 26, 2023 •

edited

Loading