Specs for all elements #554

ivirshup · 2021-04-15T08:53:10Z

This PR adds specs for each element of an anndata object. That is, on disk each element will be have attributes "encoding-type" and "encoding-version". For more info see this issue #555

TODO:

Maybe this PR, maybe not:

I have decided: not

~~- [ ] Consolidate "modified" reading methods~~
~~- [ ] Figure out what the scope is for an individual resolver~~
~~- [ ] Partial IO? (probably not until during the 0.8.x release series)~~

This might also be the right time to cut a 0.7 maintenance branch, and make master 0.8 specific.

Used current `read_elem`, `write_elem` framework for h5ad files

codecov · 2021-04-15T08:57:54Z

Codecov Report

Merging #554 (ee1e3df) into master (9520e59) will decrease coverage by 0.33%.
The diff coverage is 86.71%.

@@            Coverage Diff             @@
##           master     #554      +/-   ##
==========================================
- Coverage   83.41%   83.07%   -0.34%     
==========================================
  Files          31       34       +3     
  Lines        4148     4349     +201     
==========================================
+ Hits         3460     3613     +153     
- Misses        688      736      +48

Impacted Files	Coverage Δ
anndata/_io/__init__.py	`100.00% <ø> (ø)`
anndata/compat/__init__.py	`85.14% <77.77%> (-1.11%)`	⬇️
anndata/_io/specs/methods.py	`83.28% <83.28%> (ø)`
anndata/_io/h5ad.py	`92.45% <90.62%> (-2.57%)`	⬇️
anndata/_io/specs/registry.py	`91.48% <91.48%> (ø)`
anndata/_io/zarr.py	`88.60% <95.83%> (-5.24%)`	⬇️
anndata/__init__.py	`100.00% <100.00%> (ø)`
anndata/_core/sparse_dataset.py	`90.19% <100.00%> (-0.36%)`	⬇️
anndata/_io/specs/__init__.py	`100.00% <100.00%> (ø)`
anndata/_io/utils.py	`72.89% <100.00%> (+2.10%)`	⬆️
... and 7 more

OldFormatWarning is raised when a file so old the elements don't have an encoding type is read in. A test was added to check than newly written files don't throw these.

ivirshup · 2021-11-17T17:18:00Z

We now have warnings for when an element is found that doesn't have a spec. These are not user visible, but we should silence them further for the tests.

ivirshup · 2021-11-23T17:58:22Z

~~Working on zarr implementation. Seems mostly good, but running into the json only being able to represent a limited set of values.~~

~~Current problem: bool. Maybe I just encode these attributes as 0 and 1 for now?~~

Forgot about the json stdlib module not liking any numpy types.

ivirshup · 2021-11-23T19:00:09Z

Huh, pretty sure I meant to call c0112fc something different. Like "start zarr support".

Still needs some work. Like most backwards compat and figuring out what I can delete.

* convert np.bool_ to bool so json module doesn't error * actually label `str` elements as strings * read strings back as `str` not `np.str_`

flying-sheep · 2021-12-16T09:57:40Z

I think with #662, there’s no real way around encoding-type: "array", "element-type": "string", right?

ivirshup · 2021-12-16T10:41:51Z

Long term yes, but I think this can go through without it. I think designing that system is worth a bit more care, and we may be able to base that design (or at least get help with it) from a "single cell data interchange" schema (will hopefully hear more on that in January).

The way around this is for now is basically just long encoding names. So "string-array", "datetime-array", "nullable-datetime-array", etc.

…ding

…verse/scanpy#2090 is fixed

ivirshup added 4 commits April 15, 2021 16:47

Initial replacement

56f4fc8

Used current `read_elem`, `write_elem` framework for h5ad files

Fixed most backed tests

0c7b1f1

Fix for masked arrays

473efb8

Fix sparse dataset indexing return type (needs test)

d1246a4

ivirshup added this to the 0.8 milestone Apr 15, 2021

ivirshup mentioned this pull request May 13, 2021

Began adding an R-native writer. theislab/zellkonverter#48

Open

ivirshup added 9 commits November 17, 2021 14:21

Merge branch 'master' into specs-basic

9099e58

Remove old io interface for h5ad

00f7065

Update spec readers to use permissive hdf5 attribute reader

206cad7

Rename _read_hdf5_attribute to _read_attr

d20cd5c

Add OldFormatWarning, and make sure new files don't throw one

38c12b5

OldFormatWarning is raised when a file so old the elements don't have an encoding type is read in. A test was added to check than newly written files don't throw these.

Merge branch 'master' into specs-basic

8938d36

Fix some typing warnings

5437858

Add scanpy requirement for test

665f124

Reorganize warning imports to have short paths

c874776

ivirshup added 4 commits November 17, 2021 19:05

Support AnnData's in .uns

a4259f7

Minor spec cleanup

c95d5fe

Hide some warnings at test time

c0112fc

Merge branch 'master' into specs-basic

2b6d7d1

ivirshup added 5 commits November 23, 2021 19:38

Make test collection work

b6f3dfe

Remove copies of warning definitions

c05e025

Fix pre-commit check

d497ad4

Remove copies of warning definitions

a055494

Add support for zarr recarrays

cce53c5

Fix some small zarr compatability issues

b0d0d08

* convert np.bool_ to bool so json module doesn't error * actually label `str` elements as strings * read strings back as `str` not `np.str_`

giovp mentioned this pull request Nov 29, 2021

first attempt to support awkward arrays #647

Merged

17 tasks

ivirshup mentioned this pull request Dec 10, 2021

Meta issue: new data types #662

Open

8 tasks

ivirshup added 6 commits December 14, 2021 17:44

Deprecations for read_attribute and write_attribute

5712bf9

Added support for categorical arrays to test helpers

ce29ab2

Add basic tests for per spec writing

e8b0f34

Add tests for writing raw

b8612f3

More doc-string for write_elem

f22e17b

Basic zarr backwards compat (needs cleanup)

ffc09a6

ivirshup added 4 commits December 20, 2021 14:57

Write encoding type information with .write_* methods

704f6f6

Cleanup _io/zarr.py and remove EncodingVersions type

bc917c7

Remove some unused imports from h5ad.py

5e7a072

Also test zarr backend for anndatas in uns

2a52fd1

ivirshup marked this pull request as ready for review December 20, 2021 14:36

ivirshup mentioned this pull request Dec 20, 2021

anndata 0.8 compat scverse/mudata#8

Merged

ivirshup added 11 commits December 20, 2021 16:28

Fix read_dataframe reading strings as bytes for v0.1.0 dataframe enco…

c7d63ca

…ding

Start updating docs for fileformat

6e73262

Allow calling write_elem(f, '/', v)

ffb7add

Start backwards compat tests

c6cbac9

Reorganize specs module

2c9a54e

Slightly more useful OldFormatWarnings

4b4553d

Ignore OldFormatWarnings on case by case basis. Will be loud until sc…

49ea48f

…verse/scanpy#2090 is fixed

Use read_elem when possible

19e2b38

Remove redundant dataframe reading code

d4481cf

Remove unused functions

f8573a0

Improve formating of write errors

ee1e3df

ivirshup enabled auto-merge (squash) December 25, 2021 17:08

ivirshup merged commit 664e32b into scverse:master Dec 25, 2021

ivirshup deleted the specs-basic branch December 25, 2021 17:11

keller-mark mentioned this pull request Mar 27, 2022

Widget is hanging without success vitessce/vitessce-python#124

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specs for all elements #554

Specs for all elements #554

ivirshup commented Apr 15, 2021 •

edited

Loading

codecov bot commented Apr 15, 2021 •

edited

Loading

ivirshup commented Nov 17, 2021

ivirshup commented Nov 23, 2021 •

edited

Loading

ivirshup commented Nov 23, 2021

flying-sheep commented Dec 16, 2021

ivirshup commented Dec 16, 2021

Specs for all elements #554

Specs for all elements #554

Conversation

ivirshup commented Apr 15, 2021 • edited Loading

TODO:

Maybe this PR, maybe not:

codecov bot commented Apr 15, 2021 • edited Loading

Codecov Report

ivirshup commented Nov 17, 2021

ivirshup commented Nov 23, 2021 • edited Loading

ivirshup commented Nov 23, 2021

flying-sheep commented Dec 16, 2021

ivirshup commented Dec 16, 2021

ivirshup commented Apr 15, 2021 •

edited

Loading

codecov bot commented Apr 15, 2021 •

edited

Loading

ivirshup commented Nov 23, 2021 •

edited

Loading