Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed writing for H5ad format due to h5py objects being unserializable #1105

Closed
3 tasks done
selmanozleyen opened this issue Aug 25, 2023 · 1 comment · Fixed by #1469
Closed
3 tasks done

Distributed writing for H5ad format due to h5py objects being unserializable #1105

selmanozleyen opened this issue Aug 25, 2023 · 1 comment · Fixed by #1469

Comments

@selmanozleyen
Copy link
Member

selmanozleyen commented Aug 25, 2023

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of anndata.
  • (optional) I have confirmed this bug exists on the master branch of anndata.

Report

This is the code that will fail.

import anndata as ad
import dask.array as da
import dask.distributed as dd

with dd.LocalCluster(n_workers=1,threads_per_worker=1) as cluster:
    with dd.Client(cluster) as client:
        adata = ad.AnnData(da.random.random((100, 100), chunks=(10, 10)))
        adata.write_h5ad("test.h5ad")

Usually the same code used to fail for both zarr and h5ad, but this PR will fix the issue with zarr #1079. For h5ad serialization of h5py might be overcome by whatever Xarray does as mentioned in this issue pydata/xarray#4242

Traceback:

023-08-25 11:17:10,491 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 1 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x7f6ae131c700>
 0. 140097021523072
>.
Traceback (most recent call last):
  File "/home/sel/mambaforge/envs/dask/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
  File "/home/sel/mambaforge/envs/dask/lib/python3.9/site-packages/h5py/_hl/base.py", line 368, in __getnewargs__
    raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/sel/mambaforge/envs/dask/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
  File "/home/sel/mambaforge/envs/dask/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 29, in reducer_override
    return deserialize, serialize(obj)
  File "/home/sel/mambaforge/envs/dask/lib/python3.9/site-packages/distributed/protocol/h5py.py", line 24, in serialize_h5py_dataset
    header, _ = serialize_h5py_file(x.file)
  File "/home/sel/mambaforge/envs/dask/lib/python3.9/site-packages/distributed/protocol/h5py.py", line 11, in serialize_h5py_file
    raise ValueError("Can only serialize read-only h5py files")
ValueError: Can only serialize read-only h5py files

During handling of the above exception, another exception occurred:
...
    return Pickler.dump(self, obj)
  File "/home/sel/mambaforge/envs/dask/lib/python3.9/site-packages/h5py/_hl/base.py", line 368, in __getnewargs__
    raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled

Versions

-----
anndata             0.10.0.dev198+ga61d5d4
dask                2023.7.1
distributed         2023.7.1
numpy               1.22.4
pandas              2.0.0
scipy               1.9.3
session_info        1.0.0
zarr                2.13.3
-----
PIL                 9.2.0
asciitree           NA
asttokens           NA
attr                23.1.0
awkward             2.1.0
awkward_cpp         NA
backcall            0.2.0
bokeh               2.4.3
cffi                1.15.1
click               8.1.3
cloudpickle         2.2.0
colorama            0.4.6
comm                0.1.1
cython_runtime      NA
cytoolz             0.12.0
...
Python 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 15:55:03) [GCC 10.4.0]
Linux-6.1.44-1-MANJARO-x86_64-with-glibc2.38
-----

Session information updated at 2023-08-25 11:18

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity.
Please add a comment if you want to keep the issue open. Thank you for your contributions!

@github-actions github-actions bot added the stale label Oct 25, 2023
@ivirshup ivirshup added pinned and removed stale labels Dec 12, 2023
@ivirshup ivirshup added this to the 0.11.0 milestone Dec 12, 2023
@ilan-gold ilan-gold self-assigned this Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants