-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory consumption increase for anndata objects #363
Open
fhausmann
wants to merge
8
commits into
scverse:main
Choose a base branch
from
fhausmann:memory_fix
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 3 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
4bb72a9
Fix memory consumption increase for anndata objects (#360)
fhausmann bf3e6bc
Add documentation to test, Fix codacy issue
fhausmann 2363113
Fix parent.shape not available after anndata deletion
fhausmann d79ee0a
Include https://github.com/theislab/anndata/pull/366
fhausmann a696ba8
Remove __del__ and parent_shape property
fhausmann e5bbad1
Merge branch 'master' into memory_fix
fhausmann 582b790
Merge branch 'master' into memory_fix
fhausmann 5fff6cf
Merge branch 'main' into memory_fix
ivirshup File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the shape tuple just be an attribute storing a tuple? Previously, I avoided this to reduce redundancy, but I think there's more complexity introduced by all the getters and setters. This would also remove the need for the
AnnData.__del__
method.I was checking to see if this might break inplace subsetting, but it looks like that was already a bit broken:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced this for exactly this case (the anndata object changes shape), which then should be propagated to the
AxisArrays
. We can drop these getters, setters and the__del__
method for sure, when propagated at all functions inducing different shapes explicitly. However, I'm not sure, that's so easy to find them all (or it is only these_inplace_subset_*
functions) ?I guess also on master it's expected that
o1
is changed ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shape of the instance should only change via the
_inplace_subset_
methods.Which properties are you referring to here?
I think it would either make sense that it change, or it no longer referred to the parent of the wrong shape.
Don't feel obligated to fix things that were already broken on master in this PR. If it's not intimately related, it's probably better to open an issue and fix it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example. When we can ensure, that the parent does not change shape other than
_inplace_subset_
, like you said, we can drop theseparent_shape
property of AxisArrays and the__del__
method of anndata, store a tuple of the parent shape instead and update it in the_inplace_subset_
methods when necessary.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the code to drop the
__del__
method and theparent_shape
property.We still need it once in the
AlignedMapping
class otherwise, a lot of subclasses has to be changed to.parent_shape
does not need to be updated in_inplace_subsets_
because they get newly created there.Additionally I had a closer look to your solution above.
However, then
_obsm
cannot beAxisArrays
, otherwise you still have the circularity and if it's not anAxisArrays
it breaks a lot of other functions and would require a lot of code changes to fix it or am I wrong ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be great, thanks. I created a branch here: https://github.com/fhausmann/anndata/tree/memory_fix_lazy_obsm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see what's happening. We try to normalize passed indices to
slice
,int
,array[int]
, orarray[bool]
types as soon as possible. The normalized indices are stored in view in the_oidx
and_vidx
attributes.AxisArray._view
is only expecting to see those types as subset indices. This can be fixed by changing how the view is made in yourobsm
getter to:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this, including some other small changes. However there is now another issue.
I had to introduce this https://github.com/fhausmann/anndata/blob/aa3e55b5e6251bc373a8ab94e8e11d7fc6d16dba/anndata/_core/aligned_mapping.py#L234-L237
To update the parent when
obsm
is modified.However there still seem to be an issue pointed out by the following test: anndata/tests/test_base.py::test_setting_dim_index[obs]
If you create an anndata object containing
obsm=dict('df'=pd.Dataframe)
, copy it and create a view it looks like as all objects are referring to the same dataframe:id(curr._obsm['df']) == id(orig._obsm['df']) # True
at https://github.com/theislab/anndata/blob/58886f09b2e387c6389a2de20ed0bc7d20d1b843/anndata/tests/test_base.py#L187I think it can be fixed with creating a copy when modifying a value in
_obsm
. However this leads to infinite recursion when trying to create an anndata object from a view inplace.Additionally I think, these changes are now (not only) for fixing the original issue, but changing the anndata architecture. Should we now create a new pull request for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the AxisArray shouldn't have it's own version of the values. That should just be a reference to the values held by the parent. That is, I would think the
__init__
should just do something likeself._data = parent._obsm
.Is the code for this like:
Either way, I agree this doesn't look right. But if
curr
is a view, I'm not sure it should have values for._obsm
. It also looks like the.copy
method isn't actually making a copy of the dataframe if this is true, so that would be another thing to look into.I wouldn't worry too much about "architecture changes". A lot of the work I've done on this package has been to make changing the architecture easier. It's up to you how you'd like to organize this, but I often find starting a fresh PR/ branch helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we close this than ? Or do you think it could be worth considering if the lazy creation doesn't work out ?