Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't save MuData object to h5mu file #57

Closed
josenachorr opened this issue May 18, 2022 · 15 comments
Closed

Can't save MuData object to h5mu file #57

josenachorr opened this issue May 18, 2022 · 15 comments
Labels
bug Something isn't working

Comments

@josenachorr
Copy link

josenachorr commented May 18, 2022

I created a MuData object that contains the AnnData for 2 modalities, did some basic filtering of the datasets and then tried to save it with: joint.write("joint_data.h5mu") but this throws the following error:

TypeError                                 Traceback (most recent call last)
/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    213         try:
--> 214             return func(elem, key, val, *args, **kwargs)
    215         except Exception as e:

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/registry.py in write_elem(f, k, elem, modifiers, *args, **kwargs)
    174     else:
--> 175         _REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs)
    176 

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/registry.py in get_writer(self, dest_type, typ, modifiers)
     63         if (dest_type, typ, modifiers) not in self.write:
---> 64             raise TypeError(
     65                 f"No method has been defined for writing {typ} elements to {dest_type}"

TypeError: No method has been defined for writing <class 'mudata._core.mudata.MuAxisArrays'> elements to <class 'h5py._hl.group.Group'>

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
/tmp/efec76988f/ipykernel_20272/4022115007.py in <module>
----> 1 joint.write("../Merged/929_cancer/929_cancer_joint_data.h5mu")

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/mudata.py in write_h5mu(self, filename, **kwargs)
   1084             raise ValueError("Provide a filename!")
   1085         else:
-> 1086             write_h5mu(filename, self, **kwargs)
   1087             if self.isbacked:
   1088                 self.file.filename = filename

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/io.py in write_h5mu(filename, mdata, **kwargs)
    207 
    208     with h5py.File(filename, "w", userblock_size=512) as f:
--> 209         _write_h5mu(f, mdata, **kwargs)
    210     with open(filename, "br+") as f:
    211         nbytes = f.write(

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/io.py in _write_h5mu(file, mdata, write_data, **kwargs)
     44         dataset_kwargs=kwargs,
     45     )
---> 46     write_attribute(file, "obsm", mdata.obsm, dataset_kwargs=kwargs)
     47     write_attribute(file, "varm", mdata.varm, dataset_kwargs=kwargs)
     48     write_attribute(file, "obsp", mdata.obsp, dataset_kwargs=kwargs)

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in write_attribute(*args, **kwargs)
    132         DeprecationWarning,
    133     )
--> 134     return write_elem(*args, **kwargs)
    135 
    136 

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    218             else:
    219                 parent = _get_parent(elem)
--> 220                 raise type(e)(
    221                     f"{e}\n\n"
    222                     f"Above error raised while writing key {key!r} of {type(elem)} "

TypeError: No method has been defined for writing <class 'mudata._core.mudata.MuAxisArrays'> elements to <class 'h5py._hl.group.Group'>

Above error raised while writing key 'obsm' of <class 'h5py._hl.files.File'> to /

I also tried to save only a MuData object with just the raw matrices (no more metadata), but it throws the same error, also when trying to save each of the modalities alone (in a MuData object with only 1 modality).

I am using python '3.8.12', scanpy '1.9.1' and muon '0.1.2'

Thank you for your help, this is a very useful tool.

@josenachorr josenachorr added the bug Something isn't working label May 18, 2022
@gtca
Copy link
Collaborator

gtca commented May 18, 2022

Hey @josenachorr, thanks for reporting, which anndata version would that be?
If this is the latest anndata release v0.8, MuData is not fully compatible with it just yet as but we'll make a corresponding release soon (see the progress in scverse/mudata#8).

Please note this is expected to be fixed by an upgrade to the mudata library (https://github.com/scverse/mudata) as that's where the respective I/O code is.

@gtca
Copy link
Collaborator

gtca commented May 18, 2022

@josenachorr, and in case you wanted to try that scverse/mudata#8 PR out and let us know if it works for you, that would also be great of course!

@matthew-levy
Copy link

Besides not being able to write to h5mu files, are there any blatant issues with working with AnnData v0.8? I have several v0.8 files written from Scanpy that I intend to load into Muon and assign to the RNA aspect of the MuData object so I don't think I can install a previous version.

@gtca
Copy link
Collaborator

gtca commented May 19, 2022

Only the I/O should be affected due to the changes in AnnData.
scverse/mudata#8 seems to pass the existing tests so I expect we'll merge it soon.
You can also give it a try of course before it's merged, e.g. like this or with gh:

git clone https://github.com/scverse/mudata
cd mudata
gh pr checkout 8
pip install -e .

@josenachorr
Copy link
Author

@gtca Thank you for your fast reply!
My version of anndata is indeed 0.8.0, I think it's the one that comes by default with the latest version of scanpy. Unfortunately, I can't install the scverse/mudata#8 in my environment (don't have permissions), so I'll just wait for the official update

@matthew-levy
Copy link

Only the I/O should be affected due to the changes in AnnData. scverse/mudata#8 seems to pass the existing tests so I expect we'll merge it soon. You can also give it a try of course before it's merged, e.g. like this or with gh:

git clone https://github.com/scverse/mudata
cd mudata
gh pr checkout 8
pip install -e .

I'm sorry, I'm unfamiliar with this process. How can I do this in Windows with my python installation via Anaconda?

@gtca
Copy link
Collaborator

gtca commented May 24, 2022

@josenachorr and @matthew-levy, you should be able to give it a go with the master branch from GitHub now, e.g.:

pip install git+https://github.com/scverse/mudata

@josenachorr
Copy link
Author

Thanks @gtca I could install it with no problem. Unfortunately, an error still occurs when trying to write the object (a different one this time):

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    213         try:
--> 214             return func(elem, key, val, *args, **kwargs)
    215         except Exception as e:

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/registry.py in write_elem(f, k, elem, modifiers, *args, **kwargs)
    174     else:
--> 175         _REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs)
    176 

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/registry.py in wrapper(g, k, *args, **kwargs)
     23         def wrapper(g, k, *args, **kwargs):
---> 24             result = func(g, k, *args, **kwargs)
     25             g[k].attrs.setdefault("encoding-type", spec.encoding_type)

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/methods.py in write_dataframe(f, key, df, dataset_kwargs)
    496         if reserved in df.columns:
--> 497             raise ValueError(f"{reserved!r} is a reserved name for dataframe columns.")
    498     group = f.create_group(key)

ValueError: '_index' is a reserved name for dataframe columns.

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
/tmp/697d46de78/ipykernel_2510/4022115007.py in <module>
----> 1 joint.write("../Merged/929_cancer/929_cancer_joint_data.h5mu")

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/mudata.py in write_h5mu(self, filename, **kwargs)
   1084             raise ValueError("Provide a filename!")
   1085         else:
-> 1086             write_h5mu(filename, self, **kwargs)
   1087             if self.isbacked:
   1088                 self.file.filename = filename

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/io.py in write_h5mu(filename, mdata, **kwargs)
    205 
    206     with h5py.File(filename, "w", userblock_size=512) as f:
--> 207         _write_h5mu(f, mdata, **kwargs)
    208     with open(filename, "br+") as f:
    209         nbytes = f.write(

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/io.py in _write_h5mu(file, mdata, write_data, **kwargs)
     69             write_elem(group, "X", adata.X, dataset_kwargs=kwargs)
     70         if adata.raw is not None:
---> 71             write_elem(group, "raw", adata.raw)
     72 
     73         write_elem(group, "obs", adata.obs, dataset_kwargs=kwargs)

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    212     def func_wrapper(elem, key, val, *args, **kwargs):
    213         try:
--> 214             return func(elem, key, val, *args, **kwargs)
    215         except Exception as e:
    216             if "Above error raised while writing key" in format(e):

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/registry.py in write_elem(f, k, elem, modifiers, *args, **kwargs)
    173         )
    174     else:
--> 175         _REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs)
    176 
    177 

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/registry.py in wrapper(g, k, *args, **kwargs)
     22         @wraps(func)
     23         def wrapper(g, k, *args, **kwargs):
---> 24             result = func(g, k, *args, **kwargs)
     25             g[k].attrs.setdefault("encoding-type", spec.encoding_type)
     26             g[k].attrs.setdefault("encoding-version", spec.encoding_version)

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/methods.py in write_raw(f, k, raw, dataset_kwargs)
    257     g = f.create_group(k)
    258     write_elem(g, "X", raw.X, dataset_kwargs=dataset_kwargs)
--> 259     write_elem(g, "var", raw.var, dataset_kwargs=dataset_kwargs)
    260     write_elem(g, "varm", dict(raw.varm), dataset_kwargs=dataset_kwargs)
    261 

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    218             else:
    219                 parent = _get_parent(elem)
--> 220                 raise type(e)(
    221                     f"{e}\n\n"
    222                     f"Above error raised while writing key {key!r} of {type(elem)} "

ValueError: '_index' is a reserved name for dataframe columns.

Above error raised while writing key 'var' of <class 'h5py._hl.group.Group'> to /

@gtca
Copy link
Collaborator

gtca commented May 24, 2022

Hey @josenachorr,

I think this is an AnnData v0.8 thing. The following code causes the same error:

import numpy as np
from anndata import AnnData

x = np.random.normal(size=(10,20))
ad = AnnData(x, dtype=np.float32)
ad.obs["_index"] = "test"
ad.write("issue57.h5ad")
# => ValueError: '_index' is a reserved name for dataframe columns.
# => Above error raised while writing key 'obs' of <class 'h5py._hl.group.Group'> to /

I'll also tag @ivirshup for this.

@gtca
Copy link
Collaborator

gtca commented Jun 27, 2022

I believe the issues related to muon / MuData raised here have been resolved.
For the issue related to the _index column, I'll link scverse/anndata#731 here as it might be related.
Feel free to open new issues!

@gtca gtca closed this as completed Jun 27, 2022
@dburkhardt
Copy link

dburkhardt commented Jul 1, 2022

Can we reopen this an pin the current version of mudata to an older version of anndata? This isn't resolved:

Files to reproduce 👇
data.zip

import anndata as ad
import mudata as mu

print("anndata version: " + str(ad.__version__))
print("mudata version: " + str(mu.__version__))

rna = ad.read_h5ad("./rna.small.h5ad")
atac = ad.read_h5ad("./atac.small.h5ad")

mdata = mu.MuData({'rna':rna, 'atac':atac})
mdata.write('mdata_minrep.h5mu')

outputs

anndata version: 0.8.0
mudata version: 0.1.2
Unexpected exception formatting exception. Falling back to standard exception

Traceback (most recent call last):
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/anndata/_io/utils.py", line 214, in func_wrapper
    f"Above error raised while writing key {key!r} of {type(elem)}"
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/anndata/_io/specs/registry.py", line 175, in write_elem
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/anndata/_io/specs/registry.py", line 64, in get_writer
TypeError: No method has been defined for writing <class 'mudata._core.mudata.MuAxisArrays'> elements to <class 'h5py._hl.group.Group'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3398, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_45899/4077421615.py", line 11, in <cell line: 11>
    mdata.write('mdata_minrep.h5mu')
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/mudata/_core/mudata.py", line 1086, in write_h5mu
    write_h5mu(filename, self, **kwargs)
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/mudata/_core/io.py", line 209, in write_h5mu
    _write_h5mu(f, mdata, **kwargs)
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/mudata/_core/io.py", line 46, in _write_h5mu
    write_attribute(file, "obsm", mdata.obsm, dataset_kwargs=kwargs)
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/anndata/_io/utils.py", line 134, in write_attribute
    # -------------------------------------------------------------------------------
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/anndata/_io/utils.py", line 220, in func_wrapper
TypeError: No method has been defined for writing <class 'mudata._core.mudata.MuAxisArrays'> elements to <class 'h5py._hl.group.Group'>

Above error raised while writing key 'obsm' of <class 'h5py._hl.files.File'> to /

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 1993, in showtraceback
    stb = self.InteractiveTB.structured_traceback(
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1118, in structured_traceback
    return FormattedTB.structured_traceback(
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1012, in structured_traceback
    return VerboseTB.structured_traceback(
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/IPython/core/ultratb.py", line 865, in structured_traceback
    formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/IPython/core/ultratb.py", line 818, in format_exception_as_a_whole
    frames.append(self.format_record(r))
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/IPython/core/ultratb.py", line 736, in format_record
    result += ''.join(_format_traceback_lines(frame_info.lines, Colors, self.has_colors, lvals))
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/stack_data/utils.py", line 145, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/stack_data/core.py", line 698, in lines
    pieces = self.included_pieces
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/stack_data/utils.py", line 145, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/stack_data/core.py", line 649, in included_pieces
    pos = scope_pieces.index(self.executing_piece)
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/stack_data/utils.py", line 145, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/stack_data/core.py", line 628, in executing_piece
    return only(
  File "/srv/conda/envs/saturn/lib/python3.9/site-packages/executing/executing.py", line 164, in only
    raise NotOneValueFound('Expected one value, found 0')
executing.executing.NotOneValueFound: Expected one value, found 0

@gtca
Copy link
Collaborator

gtca commented Jul 1, 2022

Hey @dburkhardt,

Thanks for making it very easy to run your use case for me!
It works, and I get this output after running your code on your files with the current mudata master branch:

anndata version: 0.8.0
mudata version: 0.2.0

There's still a possibility I misunderstand your message but AnnData v0.8 brought forward incompatibility, which means that with anndata < 0.8 one can't read the files written with the new serialisation. As mudata is lean and reuses anndata I/O internals, its older versions can't use anndata >= 0.8 as the internals for serialisation were changed. That means that mudata >= 0.2 fixes the dependency as anndata >= 0.8. Meaning that upon installing mudata >= 0.2 (PyPI release will be there soon), a package manager should make sure that anndata is >= 0.8. If anndata is upgraded to a forward-incompatible version after mudata has been installed, there's not much we can do I think: mudata 0.1.2 specifies anndata < 0.8 as its dependency.

@dburkhardt
Copy link

Thanks @gtca! Can you please help me understand the reason why mudata 0.2 isn't on PyPI yet? Is there some reason we should just start using that today?

@gtca
Copy link
Collaborator

gtca commented Jul 15, 2022

@dburkhardt, unless I'm missing something, it is though:

image

@dburkhardt
Copy link

Hmm okay, some folks on our team are still hitting this issue, I need to go check what versions they're using

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants