Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLAlchemy 2.0 upgrades (part 5) #16932

Merged
merged 16 commits into from
Nov 29, 2023
Merged

Conversation

jdavcs
Copy link
Member

@jdavcs jdavcs commented Oct 26, 2023

Builds on #16852 (34 commits).
SQLAlchemy 2.0 compatibility upgrades. Ref #12541.

The main part of this PR is the _get_nested_collection_attributes method. It is dynamic and the structure of the result depends on the input. The method uses a Query object which must be replaced with a Select.

There are 2 main challenges here:

  1. Unlike Query, which may return tuples of items or mapped objects, a Select always returns tuples (ref). The usual workaround is to call scalars(), which will simply return the first item from each tuple, turning [(model,), ...] into [model, ...]. However, this method, depending on input, may return tuples of items, tuples of items and mapped objects, and mapped objects. Therefore, the burden of restructuring the result has to be shifted to the caller.
  2. The usage of DISTINCT requires that all columns appearing in the ORDER BY clause must also appear in the SELECT clause (see more details in commit description: c0f6b1d). We can easily add the missing columns to the select statement; however, that alters the structure of the returned values. Removing the added columns (without breaking some of the calls) is a surprisingly hard problem (the suggested approach doesn't work with a complex statement with joins and the metadata attribute which we handle differently, but which still appears in the select clause (I didn't find a way to do it using SA's public API). The alternative is to leave the added columns in the select clause - that works for the most part. The exception is the case when the columns are added dynamically and there is no way to determine how many of them will be added (depends on collection nesting): that, coupled with accessing items in the result by relative position (like here) poses the following challenge: with code like row[:-3] we intend to chop-off the last 3 elements, because we don't know how many precede them. However, with adding columns from the order by clause, we no longer know how many columns have been appended (added to the right of the list). Thus, we have a list that is dynamic "on both ends", so accessing items by relative position doesn't work. The workaround is the find_identifiers helper.
    For the rest of the cases, we use a throwaway variable *_ when destructuring the result.

Of course, this reduces the number of SA's RemovedIn20 warnings.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@jdavcs jdavcs added kind/refactoring cleanup or refactoring of existing code, no functional changes area/database Galaxy's database or data access layer labels Oct 26, 2023
@jdavcs jdavcs added this to the 23.2 milestone Oct 26, 2023
@jdavcs jdavcs mentioned this pull request Oct 26, 2023
28 tasks
@jdavcs jdavcs mentioned this pull request Nov 10, 2023
4 tasks
@jdavcs jdavcs force-pushed the dev_sa20_fix21 branch 4 times, most recently from 9d3fab8 to fb46470 Compare November 16, 2023 18:43
I double checked the performance of "exists().where(criteria)" vs.
"select(foo).where(criteria).limit(1)" with explain analyze. While the
startup cost is 40% higher for exists, the total costs are identical.
In terms of readability, "exists" is more succinct and straightforward.
This is a step towards converting the _get_nested_collection_attributes
method from Query to Select. The add_entity method does not exist on
Select; add_column should work the same way, in theory...
1. Upgrade Query to Select
2. Factor out query-building logic. The previous version returned tuples
   of items OR models (ORM objects), depending on the calling code
   (several similar data access methods were combined into this one
   generic method in PR galaxyproject#12056). The Query object would "magically"
   convert tuples of ORM objects to ORM objects. The new unified Select
   object does not do that. As as result, with Select, this method would
   return tuples of items or tuples of models (not models):

   result1 = session.execute(statement2)
   result1 == [("element_identifier_0", "element_identifier_1", "extension", "state"), ...]

   result2 = session.execute(statement2)
   result2 == [(dataset1,), (dataset2,) ...]

   Factoring out the query-building logic and having the caller execute
   it depending on the expected data structure solves this.
@jdavcs jdavcs force-pushed the dev_sa20_fix21 branch 2 times, most recently from e5cbd79 to d695173 Compare November 27, 2023 23:33
@jdavcs jdavcs marked this pull request as ready for review November 28, 2023 00:36
@jdavcs jdavcs requested a review from mvdbeek November 28, 2023 00:37
@jdavcs jdavcs requested a review from a team November 28, 2023 00:37
@jdavcs
Copy link
Member Author

jdavcs commented Nov 29, 2023

Thanks for the review + suggestions, @mvdbeek!

@jdavcs jdavcs merged commit fc0a266 into galaxyproject:dev Nov 29, 2023
48 of 49 checks passed
@mvdbeek mvdbeek modified the milestones: 23.2, 24.0 Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/database Galaxy's database or data access layer kind/refactoring cleanup or refactoring of existing code, no functional changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants