-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output predictions as zarr dataset #89
Comments
This is a great idea. Some thoughts that come to mind:
So to summarize, this is clearly required. I suggest to make the output format flexible with sensible defaults. Also to adhere to pytorch's |
Yes this would be great! (And is very much needed for our ongoing research work, we just haven't had time to put time on it). I agree that it could be nice to allow for different output formats, but I think we should start with zarr+xarray. That can also easily be converted to other things. The only saving of actual forecast right now is indeed neural-lam/neural_lam/models/ar_model.py Line 491 in c3c1722
I have a script that saves from neural-lam to zarr on the global branch: https://github.com/mllam/neural-lam/blob/prob_model_global/neural_lam/forecast_to_xarr.py It's a bit specific to that setting, but could maybe be useful to start from. In there I introduced arguments for saving and filtering what to save (https://github.com/mllam/neural-lam/blob/ac1bd6a5026104b58f4946c3310e639fa5c8a9a2/train_model.py#L239-259), but other approaches are definitely possible. |
Summarizing a dicsussion with @leifdenby @joeloskarsson @TomasLandelius about this PR. We should prioritize a bit and @sadamov's previous comment was asking for changes that are too broad. For now we need three things to run experiments, which we suggest to tackle in this PR. Let us know @khintz if you agree:
Since everyone is already tagges in this message, remind me of things I forgot. |
|
I plan to implement this using the existing create_dataarray_from_tensor function. Here's the detailed plan:
I have a couple of questions before implementing the changes. Maybe @leifdenby @joeloskarsson or @sadamov can help me with this.
|
A thing to keep in mind when working on this is that we want to preserve all the current functionality to run the whole evaluation in the test step, with the saving of a zarr dataset as an alternative. So practice you then have two version of the test loop:
I don't think there is a need to do 1 above when you are saving forecasts. I don't know if it's a bad idea to have these dual purposes of the test step? It is a bit unintuitive that the test step is for saving forecasts, as that is not what testing is. But it might be the best solution. It is also possible to use the neural-lam/neural_lam/forecast_to_xarr.py Line 341 in ac1bd6a
I think being able to give a path using e.g.
Don't have any strong opinions about this, but probably enough to chunk over time. I think we could just fix a reasonable chunking strategy. Another question is if this should save an |
Thanks a lot for looking into this @elbdmi! I agree with many things you and @joeloskarsson have discussed above. Below a few points that came to my mind after reading through it:
Could we instead of creating a zarr-group for batches just append to the zarr dataset along the
As Joel correctly mentions, this is only possible for
|
When running evaluation mode figures are being created. Currently, there is no option to output the predictions.
I would like to implement the option to output predictions as a Zarr dataset.
(WIP: https://github.com/khintz/neural-lam/tree/feature/output).
This should be possible by converting the
predictions
tensor from the functiontest_step
to a Zarr dataset.This could be done by utilising the existing function
create_dataarray_from_tensor
https://github.com/mllam/neural-lam/blob/main/neural_lam/weather_dataset.py#L509
and then simply using xarray to output to zarr.
The text was updated successfully, but these errors were encountered: