Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): Backed zarr and hdf5 sparse matrices (separate from backed mode) #765

Merged
merged 35 commits into from
Sep 8, 2023

Conversation

ivirshup
Copy link
Member

@ivirshup ivirshup commented Apr 29, 2022

Initial draft of backed sparse array support for zarr.

  • I intend to export sparse_dataset, CSRDataset, and CSCDataset class from experimental.
  • Get tests passing
  • Figure out if I'm going to expose any arguments through read_zarr

@codecov
Copy link

codecov bot commented Apr 29, 2022

Codecov Report

Merging #765 (ad016a6) into main (88dd129) will decrease coverage by 2.09%.
The diff coverage is 92.74%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #765      +/-   ##
==========================================
- Coverage   84.88%   82.79%   -2.09%     
==========================================
  Files          36       36              
  Lines        5153     5197      +44     
==========================================
- Hits         4374     4303      -71     
- Misses        779      894     +115     
Flag Coverage Δ
gpu-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
anndata/_core/anndata.py 83.60% <66.66%> (ø)
anndata/_core/file_backing.py 90.32% <87.50%> (-1.15%) ⬇️
anndata/_io/specs/methods.py 87.52% <90.00%> (-0.19%) ⬇️
anndata/_core/sparse_dataset.py 92.73% <93.15%> (+1.06%) ⬆️
anndata/_core/raw.py 79.28% <100.00%> (-4.29%) ⬇️
anndata/_io/h5ad.py 92.89% <100.00%> (ø)
anndata/_io/utils.py 76.47% <100.00%> (ø)
anndata/experimental/__init__.py 100.00% <100.00%> (ø)
anndata/experimental/merge.py 87.56% <100.00%> (ø)
anndata/experimental/multi_files/_anncollection.py 70.56% <100.00%> (ø)
... and 2 more

... and 3 files with indirect coverage changes

📢 Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scverse).

@ilan-gold
Copy link
Contributor

@ivirshup It seems that we settled on friday on not exposing anything for read_zarr. beyond that, i was not clear what the goal of assignment would be here - did you want to include it or not? i remember you saying that writing in backed mode probably wasn't so great anyway so is the goal here to get rid of it?

@ilan-gold ilan-gold changed the title Start backed sparse support for zarr (feat): backed support for zarr Jul 31, 2023
@ilan-gold
Copy link
Contributor

ilan-gold commented Jul 31, 2023

Initial draft of backed sparse array support for zarr.

  • I intend to export sparse_dataset, CSRDataset, and CSCDataset class from experimental.
  • Get tests passing
  • Figure out if I'm going to expose any arguments through read_zarr

At the moment, the first and last are still open ended as is the question of setting the data. Otherwise, tests pass and you can convert from a draft IMO. Thanks again for getting this started!

@ivirshup ivirshup added this to the 0.10.0 milestone Aug 1, 2023
@ilan-gold
Copy link
Contributor

Decisions:

  1. don't expose arguments to read_zarr
  2. do export the classes from experimental.
  3. DeprecationWarning for sparse setting - eventually remove probably, but leave space for a new implementation.

@ivirshup ivirshup changed the title (feat): backed support for zarr (feat): Backed zarr and hdf5 sparse matrices (separate from backed mode) Aug 28, 2023
Copy link
Member Author

@ivirshup ivirshup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilan-gold looking very good!

I've added a few comments, but otherwise can we add a changelog entry?


@property
def value(self) -> ss.spmatrix:
return self.to_memory()

def __repr__(self) -> str:
return (
f"<HDF5 sparse dataset: format {self.format_str!r}, "
f"<Backed sparse dataset: format {self.format_str!r}, "
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we put the type name here, so it's more like:

<CSRDataset: backend zarr, shape (10000, 10000), data_dtype '<f8'>

Comment on lines 303 to 308
# def __setitem__(self, index: Union[Index, Tuple[()]], value):

# row, col = self._normalize_index(index)
# mock_matrix = self._to_backed()
# mock_matrix[row, col] = value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good.

I think I prefer deprecation to "Instability". What were you thinking for instability warning?

Comment on lines +91 to +93
path = (
tmp_path / f"test.{diskfmt.replace('ad', '')}"
) # diskfmt is either h5ad or zarr
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worth adding the negative test too. These should be quite fast as they can be done with small data, right?

format_str = "csc"


def sparse_dataset(group) -> BaseCompressedSparseDataset:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is now exported in experimental could it get a docstring with at least one Usage example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will do.

@ilan-gold
Copy link
Contributor

@ivirshup Where is the changelog? Is it the release-notes? And if so, which release?

@ivirshup
Copy link
Member Author

Where is the changelog?

release-notes/0.10.0.md

@ilan-gold
Copy link
Contributor

@ivirshup have look at the deprecation warning! thanks!

PendingDeprecationWarning,
)
row, col = self._normalize_index(index)
mock_matrix = self._to_backed()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mock_matrix = self._to_backed()
mock_matrix = self.to_backed()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently throwing an error, but I do like the idea of making it private.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used publicly.....

@ilan-gold
Copy link
Contributor

ilan-gold commented Sep 8, 2023

I'm not sure where I got the idea in my head that this branch was passing CI but in any case, should we make zarr a dep of anndata now? It's only used for type checking right now in sparse_dataset.py

This is what is causing the failures.

@ivirshup
Copy link
Member Author

ivirshup commented Sep 8, 2023

In anndata.compat there are dummy classes defined for this. Basically replace zarr.Array and zarr.Group with anndata.compat.ZarrArray and anndata.compat.ZarrGroup

from .sparse_dataset import SparseDataset
from ..compat import ZarrArray, DaskArray, AwkArray
from .sparse_dataset import BaseCompressedSparseDataset
from ..compat import ZarrArray, ZarrGroup, DaskArray, AwkArray
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilan-gold like this

@ivirshup ivirshup enabled auto-merge (squash) September 8, 2023 13:54
@ivirshup ivirshup merged commit 8c00a29 into scverse:main Sep 8, 2023
15 of 16 checks passed
@ivirshup ivirshup mentioned this pull request Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants