Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output predictions as zarr dataset #89

Open
khintz opened this issue Nov 21, 2024 · 7 comments
Open

Output predictions as zarr dataset #89

khintz opened this issue Nov 21, 2024 · 7 comments
Assignees

Comments

@khintz
Copy link
Contributor

khintz commented Nov 21, 2024

When running evaluation mode figures are being created. Currently, there is no option to output the predictions.
I would like to implement the option to output predictions as a Zarr dataset.
(WIP: https://github.com/khintz/neural-lam/tree/feature/output).

This should be possible by converting the predictions tensor from the function test_step to a Zarr dataset.

    def test_step(self, batch, batch_idx):
        """
        Run test on single batch
        """
        prediction, target, pred_std, batch_times = self.common_step(batch)

This could be done by utilising the existing function create_dataarray_from_tensor
https://github.com/mllam/neural-lam/blob/main/neural_lam/weather_dataset.py#L509
and then simply using xarray to output to zarr.

@sadamov
Copy link
Collaborator

sadamov commented Nov 21, 2024

This is a great idea. Some thoughts that come to mind:

  • Given we have implemented a flexible datastore setup for various input datatypes, it might make sense to give the same flexibility in the output. So basically the mdp datastore returns a zarr, the npy datastore returns a numpy-array, by default. The user can however always override this default and output to any given format.
  • There actually is a way to output predictions right now, it is hardcoded as pytorch tensors:
    # Save pred and target as .pt files
    I don't think it's good to have this inside plot_example but rather this could be replaced by your new functionality. (i.e. Output as tensor without any metadata for debugging.)
  • We are currently running inference and auto-regressive rollouts within pytorch's on_test_... logic. But actually pytorch would provide a on_predict_... set of functions. This PR suggested changing this and also making sure that predict_step() is properly used. https://github.com/mllam/neural-lam/pull/45/files Currently, neural-lam overwrites predict_step and uses it during training.
  • Here a first attempt at prediction output: https://github.com/mllam/neural-lam/pull/44/files It is based on GRIB specifically, but maybe still somewhat useful.

So to summarize, this is clearly required. I suggest to make the output format flexible with sensible defaults. Also to adhere to pytorch's predict_... logic (maybe in a separate PR). And finally get rid of the two Draft PRs listed above.

@joeloskarsson
Copy link
Collaborator

Yes this would be great! (And is very much needed for our ongoing research work, we just haven't had time to put time on it). I agree that it could be nice to allow for different output formats, but I think we should start with zarr+xarray. That can also easily be converted to other things.

The only saving of actual forecast right now is indeed

# Save pred and target as .pt files
. But note that that only saves one single example, so was not intended for saving all forecasts in the test set for evaluation later. I'm not sure if there is a need / interest in saving pytorch tensors for that.

I have a script that saves from neural-lam to zarr on the global branch: https://github.com/mllam/neural-lam/blob/prob_model_global/neural_lam/forecast_to_xarr.py It's a bit specific to that setting, but could maybe be useful to start from. In there I introduced arguments for saving and filtering what to save (https://github.com/mllam/neural-lam/blob/ac1bd6a5026104b58f4946c3310e639fa5c8a9a2/train_model.py#L239-259), but other approaches are definitely possible.

@leifdenby leifdenby self-assigned this Nov 26, 2024
@sadamov sadamov mentioned this issue Dec 3, 2024
21 tasks
@sadamov
Copy link
Collaborator

sadamov commented Dec 5, 2024

Summarizing a dicsussion with @leifdenby @joeloskarsson @TomasLandelius about this PR. We should prioritize a bit and @sadamov's previous comment was asking for changes that are too broad. For now we need three things to run experiments, which we suggest to tackle in this PR. Let us know @khintz if you agree:

  • Generate the same xarray objects that come into the model - after they came out of the model (here Leif has already done work on the vizualisation side of things)
  • Perform some stacking along ar_step and batch (e.g. each time-sample has now a forecast-dim lead_time)
  • Write the stacked object to disk, sensibly chunked and compressed

Since everyone is already tagges in this message, remind me of things I forgot.

@joeloskarsson
Copy link
Collaborator

@elbdmi elbdmi self-assigned this Dec 5, 2024
@joeloskarsson joeloskarsson added this to the v 0.5.0 (proposed) milestone Dec 10, 2024
@elbdmi
Copy link

elbdmi commented Dec 11, 2024

I plan to implement this using the existing create_dataarray_from_tensor function. Here's the detailed plan:

  1. Prediction Conversion: In the test_step method, predictions will be converted to xarray.DataArray objects using create_dataarray_from_tensor.
  2. Zarr Saving: Each batch will be saved under a unique group in the Zarr dataset (e.g., batch_0_predictions, batch_1_predictions).
  3. Testing: After implementation, I will test the changes to confirm correct functionality and proper organization of the Zarr dataset. I will also implement just a simple test to confirm that it is working correctly.

I have a couple of questions before implementing the changes. Maybe @leifdenby @joeloskarsson or @sadamov can help me with this.

  1. Output Path for Zarr File: Where should the Zarr file be saved? Should it be a fixed location specified in the code, or do we want to make this configurable via a command-line argument or configuration file?
  2. Chunking Strategy: Could you clarify how you’d like the chunking to be implemented? Should the chunks be predefined with specific dimensions (e.g., time, latitude, longitude), or should the chunk sizes adapt based on the dataset?

@joeloskarsson
Copy link
Collaborator

A thing to keep in mind when working on this is that we want to preserve all the current functionality to run the whole evaluation in the test step, with the saving of a zarr dataset as an alternative. So practice you then have two version of the test loop:

  1. Compute metrics for forecasts and log these
  2. Save forecasts to zarr

I don't think there is a need to do 1 above when you are saving forecasts. I don't know if it's a bad idea to have these dual purposes of the test step? It is a bit unintuitive that the test step is for saving forecasts, as that is not what testing is. But it might be the best solution. It is also possible to use the predict_step as mentioned by @sadamov above and something like

for batch in tqdm(dataloader):

Output Path for Zarr File: Where should the Zarr file be saved? Should it be a fixed location specified in the code, or do we want to make this configurable via a command-line argument or configuration file?

I think being able to give a path using e.g. --forecast_save_path would be nice.

Chunking Strategy: Could you clarify how you’d like the chunking to be implemented? Should the chunks be predefined with specific dimensions (e.g., time, latitude, longitude), or should the chunk sizes adapt based on the dataset?

Don't have any strong opinions about this, but probably enough to chunk over time. I think we could just fix a reasonable chunking strategy. Another question is if this should save an xarray.DataArray or xarray.DataSet (practically if variables should be in different files or not). I think DataSet would make most sense. Another consideration here is if the returned zarr should have 1 spatial dimension (grid-nodes) or 2 (x, y). We can't always guarantee that 2 spatial dimensions can be created (only if the datastore is BaseRegularGridDatastore I think?). Much of this depends on the further use of the zarr output for evaluation. I think @sadamov have some useful input wrt to that.

@sadamov
Copy link
Collaborator

sadamov commented Dec 12, 2024

Thanks a lot for looking into this @elbdmi! I agree with many things you and @joeloskarsson have discussed above. Below a few points that came to my mind after reading through it:

  1. Zarr Saving: Each batch will be saved under a unique group in the Zarr dataset (e.g., batch_0_predictions, batch_1_predictions).

Could we instead of creating a zarr-group for batches just append to the zarr dataset along the dim=time, as the batching is done across time? So the first batch writes the zarr archive (creates it) mode="w" and the following batches are appended along time mode="a"

  1. I agree that a xarray Dataset would fit perfectly to this output (better than DataArray). Below is a screenshot of a zarr-format that I currently use for verification. Since this is Danra data it's probably a good standard to use for output as well:
<xarray.Dataset> Size: 751MB
Dimensions:  (time: 100, y: 589, x: 789)
Coordinates:
    lat      (y, x) float64 4MB dask.array<chunksize=(589, 789), meta=np.ndarray>
    lon      (y, x) float64 4MB dask.array<chunksize=(589, 789), meta=np.ndarray>
  * time     (time) datetime64[ns] 800B 2020-09-01 ... 2020-09-13T09:00:00
  * x        (x) float64 6kB -1.999e+06 -1.997e+06 ... -3.175e+04 -2.925e+04
  * y        (y) float64 5kB -6.095e+05 -6.07e+05 ... 8.58e+05 8.605e+05
Data variables:
    u10      (time, y, x) float64 372MB dask.array<chunksize=(20, 589, 789), meta=np.ndarray>
    v10      (time, y, x) float64 372MB dask.array<chunksize=(20, 589, 789), meta=np.ndarray>
Attributes:
    description:  All prognostic variables for 10-year period on reduced levels

As Joel correctly mentions, this is only possible for BaseRegularGridDatastore. So maybe it's easiest for now to always return 1 spatial dimension (grid-nodes) - stacked. We just need to make sure that we have all the required information to unstack the output in the verification-notebook. I think we can again use existing functionality from the datastores here to stack/unstack?

  1. I would chunk across time and in the future also across ensemble_member once we allow that. There is an argument to be made whether we should also chunk the vertical level, as they are treated as independent 2D fields during verification. The optimal chunking will depend on the xy-resolution. I would only chunk by chunk({"time": 1}) and make this our defined chunking for the output, for now. Chunking by time is also very natural when you create the zarr-archive by appending to the time-dim as mentioned in (1.)

  2. With regards to "overloading" the test_step vs. re-writing the logic with a predict_step. I leave that up to you and it highly depends on how much time you have to make the code "beautiful" vs. "functional" Both options work for me. 👍 I would chose one that doesn't take weeks to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants