Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added more metadata to Dask Dataframe creation #19

Merged
merged 3 commits into from
Feb 19, 2024
Merged

Conversation

alxmrs
Copy link
Owner

@alxmrs alxmrs commented Feb 19, 2024

Fixed #17. It looks like it is, in fact, lazily opened. len(era5_df) requires a full scan. I opened #18 to address the length issue.

It should return right away since we want to convert chunks lazily. From the profile traces, it looks like `to_dd` converts the chunks right away.
I found that either `from_delayed` or `from_map` took forever to get the length of era5. This looks like a more fundamental issue with Dask Dataframes. Instead, I checked how cast it was to get columns.
@alxmrs alxmrs merged commit 7da2184 into main Feb 19, 2024
@alxmrs alxmrs deleted the fix-fast-zarr-open branch February 19, 2024 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Opening large Zarr datasets should be lazy (and fast)
1 participant