-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQLAlchemy 2.0 upgrades (part 5) #16932
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jdavcs
added
kind/refactoring
cleanup or refactoring of existing code, no functional changes
area/database
Galaxy's database or data access layer
labels
Oct 26, 2023
jdavcs
force-pushed
the
dev_sa20_fix21
branch
from
November 1, 2023 21:55
8c88538
to
3f5b1d0
Compare
4 tasks
jdavcs
force-pushed
the
dev_sa20_fix21
branch
4 times, most recently
from
November 16, 2023 18:43
9d3fab8
to
fb46470
Compare
I double checked the performance of "exists().where(criteria)" vs. "select(foo).where(criteria).limit(1)" with explain analyze. While the startup cost is 40% higher for exists, the total costs are identical. In terms of readability, "exists" is more succinct and straightforward.
This is a step towards converting the _get_nested_collection_attributes method from Query to Select. The add_entity method does not exist on Select; add_column should work the same way, in theory...
1. Upgrade Query to Select 2. Factor out query-building logic. The previous version returned tuples of items OR models (ORM objects), depending on the calling code (several similar data access methods were combined into this one generic method in PR galaxyproject#12056). The Query object would "magically" convert tuples of ORM objects to ORM objects. The new unified Select object does not do that. As as result, with Select, this method would return tuples of items or tuples of models (not models): result1 = session.execute(statement2) result1 == [("element_identifier_0", "element_identifier_1", "extension", "state"), ...] result2 = session.execute(statement2) result2 == [(dataset1,), (dataset2,) ...] Factoring out the query-building logic and having the caller execute it depending on the expected data structure solves this.
Required if using DISTINCT
jdavcs
force-pushed
the
dev_sa20_fix21
branch
2 times, most recently
from
November 27, 2023 23:33
e5cbd79
to
d695173
Compare
jdavcs
force-pushed
the
dev_sa20_fix21
branch
from
November 27, 2023 23:39
d695173
to
f45e6d1
Compare
mvdbeek
reviewed
Nov 28, 2023
mvdbeek
reviewed
Nov 28, 2023
jdavcs
force-pushed
the
dev_sa20_fix21
branch
from
November 28, 2023 20:19
55e03da
to
eeeaf7b
Compare
mvdbeek
approved these changes
Nov 29, 2023
Thanks for the review + suggestions, @mvdbeek! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/database
Galaxy's database or data access layer
kind/refactoring
cleanup or refactoring of existing code, no functional changes
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Builds on #16852 (34 commits).
SQLAlchemy 2.0 compatibility upgrades. Ref #12541.
The main part of this PR is the
_get_nested_collection_attributes
method. It is dynamic and the structure of the result depends on the input. The method uses aQuery
object which must be replaced with aSelect
.There are 2 main challenges here:
Query
, which may return tuples of items or mapped objects, aSelect
always returns tuples (ref). The usual workaround is to callscalars()
, which will simply return the first item from each tuple, turning[(model,), ...]
into[model, ...]
. However, this method, depending on input, may return tuples of items, tuples of items and mapped objects, and mapped objects. Therefore, the burden of restructuring the result has to be shifted to the caller.metadata
attribute which we handle differently, but which still appears in the select clause (I didn't find a way to do it using SA's public API). The alternative is to leave the added columns in the select clause - that works for the most part. The exception is the case when the columns are added dynamically and there is no way to determine how many of them will be added (depends on collection nesting): that, coupled with accessing items in the result by relative position (like here) poses the following challenge: with code likerow[:-3]
we intend to chop-off the last 3 elements, because we don't know how many precede them. However, with adding columns from the order by clause, we no longer know how many columns have been appended (added to the right of the list). Thus, we have a list that is dynamic "on both ends", so accessing items by relative position doesn't work. The workaround is thefind_identifiers
helper.For the rest of the cases, we use a throwaway variable
*_
when destructuring the result.Of course, this reduces the number of SA's RemovedIn20 warnings.
How to test the changes?
(Select all options that apply)
License