Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated _inplace #1115

Closed
wants to merge 4 commits into from
Closed

updated _inplace #1115

wants to merge 4 commits into from

Conversation

Intron7
Copy link
Member

@Intron7 Intron7 commented Aug 31, 2023

Here is an updated version for the inplace function .Layers and .X are now copied inplace without copy.

@Intron7 Intron7 added this to the 0.10.0 milestone Aug 31, 2023
@Intron7 Intron7 requested a review from ivirshup August 31, 2023 09:02
@codecov
Copy link

codecov bot commented Aug 31, 2023

Codecov Report

Merging #1115 (ad5589f) into main (22f33bb) will increase coverage by 0.06%.
The diff coverage is 92.72%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1115      +/-   ##
==========================================
+ Coverage   84.90%   84.96%   +0.06%     
==========================================
  Files          36       36              
  Lines        5133     5182      +49     
==========================================
+ Hits         4358     4403      +45     
- Misses        775      779       +4     
Flag Coverage Δ
gpu-tests 52.00% <7.27%> (-0.46%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
anndata/_core/anndata.py 83.31% <92.72%> (+0.51%) ⬆️

@ivirshup
Copy link
Member

ivirshup commented Aug 31, 2023

Thanks for the PR!

Does this solve the memory issues you were facing in rapids-singlecell? Could you share some memory measurements showing this?

It would also be great to run the scanpy test suite with this branch to battle-test the change.

Comment on lines +1265 to +1289
_, var_dx = _normalize_indices(
(slice(None, None, None), index), self.obs_names, self.var_names
)

if isinstance(var_dx, (int, np.integer)):
if not (-self.n_vars <= var_dx < self.n_vars):
raise IndexError(f"Variable index `{var_dx}` is out of range.")
var_dx += self.n_vars * (var_dx < 0)
var_dx = slice(var_dx, var_dx + 1, 1)
y_dim = _get_dimensions(var_dx, self.shape[1])
self._n_vars = y_dim
if self.X is not None:
self._X = self.X[:, var_dx].reshape(self.n_obs, y_dim)
uns = copy(self._uns)
var_sub = self.var.iloc[var_dx]
self._remove_unused_categories(self.var, var_sub, uns)
self._var = pd.DataFrame(var_sub)
self._uns = uns

def _inplace_subset_obs(self, index: Index1D):
if self.layers:
for key, matrix in self.layers.items():
self.layers[key] = matrix[:, var_dx].reshape(self.n_obs, y_dim)
self._varm = self.varm._view(self, (var_dx,)).copy()
self._varp = self.varp._view(self, var_dx).copy()
self._is_view = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost identical to the other big code block, needs to be deduplicated

@Intron7
Copy link
Member Author

Intron7 commented Sep 1, 2023

I reran the analysis (1Million Cells) with the this code again and some things are really confusing. So the cunndata inplace works and after some promising initial results it looks like this doesn't fix the memory issue. Even though the subsetting is exactly the same. At some point I thought it had to do with views and how .X is presented. So I ran some more tests to see whats happening. Boolean subsets seem to work with .X from anndata. The index somehow doesn't. This doesn't explain why cunndata work because I also use the _normalize_index function from anndata.

@ivirshup
Copy link
Member

ivirshup commented Sep 5, 2023

Right now I'm -1 on this PR since it's only temporarily lower memory usage and because it can leave the anndata in a corrupted state on failure, which is much less likely to happen in our current implementation.

@Intron7
Copy link
Member Author

Intron7 commented Sep 5, 2023

since we found the underlying issue im fine with closing the PR

@Intron7 Intron7 closed this Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants