Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskLoader fails when declaring multiple target_delta_t #129

Closed
acocac opened this issue Sep 18, 2024 · 1 comment
Closed

TaskLoader fails when declaring multiple target_delta_t #129

acocac opened this issue Sep 18, 2024 · 1 comment
Labels
help wanted Extra attention is needed

Comments

@acocac
Copy link
Member

acocac commented Sep 18, 2024

I start experimenting a forecasting set up in DeepSensor (see a MWE in colab). The example below shows how I define a TaskLoader for predicting air temperature in the next two days (lead times):

task_loader = TaskLoader(
    context=[era5_ds["air"],] * 3,
    context_delta_t=[-1, -2, 0],
    target=[era5_ds["air"],era5_ds["air"]],
    target_delta_t=[1, 2],
    time_freq="D",  # daily frequency (the default)
)

Then I reuse the training procedure suggested in DeepSensor tutorials. However, the training stops and gives an error when computing RMSE for the validation tasks.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-22-d73321d37ac8>](https://7773me6r0z9-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240916-060345_RC00_675086238#) in <cell line: 16>()
     18     batch_losses = trainer(train_tasks)
     19     losses.append(np.mean(batch_losses))
---> 20     val_rmses.append(compute_val_rmse(model, val_tasks))
     21     if val_rmses[-1] < val_rmse_best:
     22         val_rmse_best = val_rmses[-1]

1 frames
[/usr/local/lib/python3.10/dist-packages/deepsensor/data/processor.py](https://7773me6r0z9-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240916-060345_RC00_675086238#) in map_array(self, data, var_ID, method, unnorm, add_offset)
    516             c = -c / m
    517             m = 1 / m
--> 518         data = data * m
    519         if add_offset:
    520             data = data + c

TypeError: can't multiply sequence by non-int of type 'float'

My guess is that some changes should be required in map_array when considering multiple targets. I suggest recognising the object type of data below. If it's a list, then perform the multiply operator per element, in this case np.array.

data = data * m

@acocac acocac added the help wanted Extra attention is needed label Sep 18, 2024
@tom-andersson
Copy link
Collaborator

Hey @acocac, thanks for raising this and the MWE. So you have two target sets for the two lead times, and you want to compute unnormalised RMSE in Kelvin for the first lead time. The model.predict interface is the intended way to get unnormalised predictions for computing unnormalised metrics. I've recently improved DeepSensor's forecasting functionality in deepsensor v0.4 which fixes model.predict forecast outputs; see #130 and #132.

However, in the MWE, you are not using the data_processor in the right way. .map_array is intended for a single array, not a list, so I suggest we keep the interface as-is. As a workaround, keeping the current approach:

# Don't do this:
# mean = data_processor.map_array(model.mean(task), target_var_ID, unnorm=True)
# true = data_processor.map_array(task["Y_t"][0], target_var_ID, unnorm=True)
# Do this:
lead_time_idx = 0
mean = model.mean(task)[lead_time_idx]
true = task["Y_t"][lead_time_idx]
error = np.abs(mean - true)
error_unnormalised = data_processor.map_array(error, target_var_ID, unnorm=True, add_offset=False)

But I'd suggest updating DeepSensor and using model.predict :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants