-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(feat): Backed zarr and hdf5 sparse matrices (separate from backed mode) #765
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #765 +/- ##
==========================================
- Coverage 84.88% 82.79% -2.09%
==========================================
Files 36 36
Lines 5153 5197 +44
==========================================
- Hits 4374 4303 -71
- Misses 779 894 +115
Flags with carried forward coverage won't be shown. Click here to find out more.
|
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
@ivirshup It seems that we settled on friday on not exposing anything for |
backed
support for zarr
At the moment, the first and last are still open ended as is the question of setting the data. Otherwise, tests pass and you can convert from a draft IMO. Thanks again for getting this started! |
Decisions:
|
4727f97
to
4187f51
Compare
612f781
to
9d293ba
Compare
backed
support for zarr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilan-gold looking very good!
I've added a few comments, but otherwise can we add a changelog entry?
anndata/_core/sparse_dataset.py
Outdated
|
||
@property | ||
def value(self) -> ss.spmatrix: | ||
return self.to_memory() | ||
|
||
def __repr__(self) -> str: | ||
return ( | ||
f"<HDF5 sparse dataset: format {self.format_str!r}, " | ||
f"<Backed sparse dataset: format {self.format_str!r}, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we put the type name here, so it's more like:
<CSRDataset: backend zarr, shape (10000, 10000), data_dtype '<f8'>
anndata/_core/sparse_dataset.py
Outdated
# def __setitem__(self, index: Union[Index, Tuple[()]], value): | ||
|
||
# row, col = self._normalize_index(index) | ||
# mock_matrix = self._to_backed() | ||
# mock_matrix[row, col] = value | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good.
I think I prefer deprecation to "Instability". What were you thinking for instability warning?
path = ( | ||
tmp_path / f"test.{diskfmt.replace('ad', '')}" | ||
) # diskfmt is either h5ad or zarr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be worth adding the negative test too. These should be quite fast as they can be done with small data, right?
anndata/_core/sparse_dataset.py
Outdated
format_str = "csc" | ||
|
||
|
||
def sparse_dataset(group) -> BaseCompressedSparseDataset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is now exported in experimental
could it get a docstring with at least one Usage
example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will do.
@ivirshup Where is the changelog? Is it the |
|
@ivirshup have look at the deprecation warning! thanks! |
PendingDeprecationWarning, | ||
) | ||
row, col = self._normalize_index(index) | ||
mock_matrix = self._to_backed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mock_matrix = self._to_backed() | |
mock_matrix = self.to_backed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently throwing an error, but I do like the idea of making it private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used publicly.....
I'm not sure where I got the idea in my head that this branch was passing CI but in any case, should we make This is what is causing the failures. |
In |
from .sparse_dataset import SparseDataset | ||
from ..compat import ZarrArray, DaskArray, AwkArray | ||
from .sparse_dataset import BaseCompressedSparseDataset | ||
from ..compat import ZarrArray, ZarrGroup, DaskArray, AwkArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilan-gold like this
Initial draft of backed sparse array support for zarr.
sparse_dataset
,CSRDataset
, andCSCDataset
class fromexperimental
.read_zarr