-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translate h5py soft and hard linked datasets with an optional kwarg #463
Conversation
I'll look into the test failure, which looks unrelated. Maybe something about numpy 2, which just came out. |
(update: I have fixed one test failure so far, so one more to go) |
If you sync your branch, things should pass now. |
@martindurant thanks for helping @ljwoods2 with this! This is part of his GSOC 2024 work with MDAnalysis, which I am co-supervising. With that hat on would you be able to approve the workflows here by any chance? Getting this merged would help us a lot moving forward. Let me know if I can help in any way. 😄 |
Sorry, new commits by themselves don't send a notification. Would you like me to add a label to this PR for gsoc? |
I switched out the string version compatibility with a direct check of if the |
Thanks a lot @martindurant! Appreciate your help here 👍 |
Fixes #459
Linked groups can't be translated as easily as linked datasets- the h5py method
visititems_links
is only called once foreach link. This means that if you have this hdf5 layout for example:
And you try to create this hard link and translate it:
Then
visititems_links
will be called on "box/edges/step", "time", and "value" during the traversal of "particles/trajectory" and therefore can't be traversed again during the traversal of "particles/trajectory2", resulting in an empty translated group. This behavior makes sense since it avoids the problem of circularly-linked groups, but it makes translating groups with h5py-builtin traversal methods difficult. If zarr-python 3.0 implements links,visititems
andvisititems_links
may have to be replaced with custom directory traversal code as a resultThis PR only allows translating linked datasets by traversing the h5py directory twice and using
require_dataset
instead ofcreate_dataset
to avoid duplicating non-linked groups and datasets.