Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mask_fillvalues and mask_multimodel recompute mask from all input cubes for every output cube #2521

Open
bouweandela opened this issue Sep 10, 2024 · 3 comments · May be fixed by #2522
Open
Labels
dask related to improvements using Dask preprocessor Related to the preprocessor

Comments

@bouweandela
Copy link
Member

Because mask_fillvalues and mask_multimodel are now lazy, they recompute the mask based on all the input cube for every output cube. This is slow and unnecessary because the mask is the same for every output cube.

@bouweandela bouweandela added preprocessor Related to the preprocessor dask related to improvements using Dask labels Sep 10, 2024
@valeriupredoi
Copy link
Contributor

valeriupredoi commented Oct 22, 2024

could you maybe tell us more about the process, pls, bud? mask_multimodel calls _multimodel_mask_cubes(cubes, shape) or the equivalent for products, where a composite mask is built from the mask of each cube in cubes so the iteration is needed due to each cube having a different mask - you saying this iteration is done for each cube a la:

for cube in cubes:
    _multimodel_mask_cubes(cubes, shape)  # that will, in turn, loop over cubes again

?

@bouweandela
Copy link
Member Author

Each output file requires a lazy mask that can be computed from all input files, so that means all the input files must be read to save a single output file. Because the output files are currently saved (and computed) one at a time, that means all the input data needs to be read as many times as there are output files. Is that any more clear?

@valeriupredoi
Copy link
Contributor

thanks, bud! I need to read this carefully tomorrow, am just about to go home shove a pizza in the oven 🍕

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dask related to improvements using Dask preprocessor Related to the preprocessor
Projects
None yet
2 participants