-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updated _inplace #1115
updated _inplace #1115
Conversation
for more information, see https://pre-commit.ci
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #1115 +/- ##
==========================================
+ Coverage 84.90% 84.96% +0.06%
==========================================
Files 36 36
Lines 5133 5182 +49
==========================================
+ Hits 4358 4403 +45
- Misses 775 779 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Thanks for the PR! Does this solve the memory issues you were facing in rapids-singlecell? Could you share some memory measurements showing this? It would also be great to run the scanpy test suite with this branch to battle-test the change. |
for more information, see https://pre-commit.ci
_, var_dx = _normalize_indices( | ||
(slice(None, None, None), index), self.obs_names, self.var_names | ||
) | ||
|
||
if isinstance(var_dx, (int, np.integer)): | ||
if not (-self.n_vars <= var_dx < self.n_vars): | ||
raise IndexError(f"Variable index `{var_dx}` is out of range.") | ||
var_dx += self.n_vars * (var_dx < 0) | ||
var_dx = slice(var_dx, var_dx + 1, 1) | ||
y_dim = _get_dimensions(var_dx, self.shape[1]) | ||
self._n_vars = y_dim | ||
if self.X is not None: | ||
self._X = self.X[:, var_dx].reshape(self.n_obs, y_dim) | ||
uns = copy(self._uns) | ||
var_sub = self.var.iloc[var_dx] | ||
self._remove_unused_categories(self.var, var_sub, uns) | ||
self._var = pd.DataFrame(var_sub) | ||
self._uns = uns | ||
|
||
def _inplace_subset_obs(self, index: Index1D): | ||
if self.layers: | ||
for key, matrix in self.layers.items(): | ||
self.layers[key] = matrix[:, var_dx].reshape(self.n_obs, y_dim) | ||
self._varm = self.varm._view(self, (var_dx,)).copy() | ||
self._varp = self.varp._view(self, var_dx).copy() | ||
self._is_view = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost identical to the other big code block, needs to be deduplicated
I reran the analysis (1Million Cells) with the this code again and some things are really confusing. So the cunndata inplace works and after some promising initial results it looks like this doesn't fix the memory issue. Even though the subsetting is exactly the same. At some point I thought it had to do with views and how |
Right now I'm -1 on this PR since it's only temporarily lower memory usage and because it can leave the anndata in a corrupted state on failure, which is much less likely to happen in our current implementation. |
since we found the underlying issue im fine with closing the PR |
Here is an updated version for the inplace function
.Layers
and.X
are now copied inplace without copy.