You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm wondering if anyone was thinking about best practice when using and versioning anndata files in a situation that:
X is big (e.g. 200GB or more) and doesn't change over time,
obs and obsm are much smaller but they do change (scientists changing the classifications, adding columns, etc.).
It seems to me that it is not the best idea to create a new 200GB file every single time when obs is modified. Right now we are thinking about creating two separate files, one that has only X, and the other that has an empty X and current version of obs and obsm. But perhaps there are already better ways of dealing with this issue?
The text was updated successfully, but these errors were encountered:
@Zethson - i don't know the details of the work you're doing on the partial reading/writing capabilities, but our problems is also even the storage/versioning of the file. Every time obs or obsm is updated (and it is quite often) we have to create a new version of the entire file even if the biggest part of the file, X, is unchanged.
There's a lot of complexity around the versioning side of things, especially being able to tell whether anything has changed, unless the data is managed using something like dask.
In the nearer term I would like to make it more possible for users to handle this manually with a merge function, so you can handle your "deltas" manually.
In your case, if you are only updating entries in obsm, obs, I would suggest potentially saving those seperatley (using the read_elem, write_elem functions) then you could do something like:
I'm wondering if anyone was thinking about best practice when using and versioning anndata files in a situation that:
X
is big (e.g. 200GB or more) and doesn't change over time,obs
andobsm
are much smaller but they do change (scientists changing the classifications, adding columns, etc.).It seems to me that it is not the best idea to create a new 200GB file every single time when
obs
is modified. Right now we are thinking about creating two separate files, one that has onlyX
, and the other that has an emptyX
and current version ofobs
andobsm
. But perhaps there are already better ways of dealing with this issue?The text was updated successfully, but these errors were encountered: