Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidance on best practice for saving DD arrays? #860

Open
alex-s-gardner opened this issue Nov 14, 2024 · 6 comments
Open

Guidance on best practice for saving DD arrays? #860

alex-s-gardner opened this issue Nov 14, 2024 · 6 comments

Comments

@alex-s-gardner
Copy link
Contributor

alex-s-gardner commented Nov 14, 2024

Is there a best practice to saving DD arrays? For my large workflows I need checkpoints where I can save the state from which the processing can be restarted. Right now I'm using JLD2. Is that the best approach? Does DD want to explicitly "support" saving in this format.. such that DD is minimizes changes that could break loading of data archived with JLD2? Would it be helpful to add "Saving DD data" to the documentation?

@rafaqz
Copy link
Owner

rafaqz commented Nov 14, 2024

JLD2 is bad long term as it's locked to a DD version. I would use a netcdf via Rasters.jl or YAX

We can't promise not to break JLD, even with 1.0 we need the freedom to add fields to structs, and add type parameters

@felixcremer
Copy link
Contributor

Could we use zarr to save plain DimArrays? Because nothing in the zarr spec is geo specific it should be possible. And we would rather have to map the DimArray layout to Zarr directly. But I am not sure, how this is going to interact with the zarr handling code in YAXArrays or Rasters.

@alex-s-gardner
Copy link
Contributor Author

alex-s-gardner commented Nov 14, 2024

I'm realizing that Julia's powerful and flexible Type system becomes it's Achilles heel when it comes to saving data. With Matlab you can just save everything in your workspace to a .MAT and it will be backward compatible if the .MAT format is update. It seems that .JLD2 is Julia's best attempt at this but the flexibility for each package to define it's own types makes archival saving nearly impossible without conforming to an external data standard... this means no mixing of DataFrames and DD arrays and other such unique and creative data combinations.

Regardless, we should probably have some guidance what one's options are for saving a DD to disk

@rafaqz
Copy link
Owner

rafaqz commented Nov 15, 2024

In most languages saving the workspace like that is bad practice except short term personal use. You want some real standardised serialisation.

Felix is right Zarr is probably the best array format, but Rasters doesn't write it yet, so YAX.

But forget JLD2 as anything but short-term personal use.

@alex-s-gardner
Copy link
Contributor Author

If we (I) were to add documentation for best practice to saving DD is it to used YAX (or JDL2 for temporary storage)... or is it simply best to leave this undefined and let users determine a saving strategy the best suites their needs?

@rafaqz
Copy link
Owner

rafaqz commented Dec 15, 2024

We could mention YAX and Rasters, with the choice depending on what you want to save as. I wouldn't recommend JLD2 in the docs (because people will complain if I add a type paremeter to a Lookup and break their jlds, but that is allowed to happen at any time)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants