Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDF5 file needs to be split into multiple groups #71

Open
johnlees opened this issue Apr 21, 2022 · 1 comment
Open

HDF5 file needs to be split into multiple groups #71

johnlees opened this issue Apr 21, 2022 · 1 comment
Assignees
Labels
enhancement New feature or request long-term Less urgent features that require more research

Comments

@johnlees
Copy link
Member

When there gets to be >500k or so sketches in the sketch group performance gets very slow, looks like it's because the metadata cache size isn't large enough: https://forum.hdfgroup.org/t/limit-on-the-number-of-datasets-in-one-group/5892

I think the solution will be to make subgroups 'sketch1', 'sketch2' etc with some block size of sketches in each, say 30k. Just need a bit of care to make sure it's all backwards compatible.

@johnlees johnlees added the enhancement New feature or request label Apr 21, 2022
@johnlees johnlees self-assigned this Apr 21, 2022
@johnlees johnlees added the long-term Less urgent features that require more research label Apr 29, 2022
@johnlees
Copy link
Member Author

johnlees commented May 3, 2022

I'm wondering if switching to apache arrow at some point might solve this and #37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request long-term Less urgent features that require more research
Projects
None yet
Development

No branches or pull requests

1 participant