You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was unable to open a collection of zipped tiffs using the OpenURLWithFSSpec recipe.
I tested the semantics with fsspec and xarray, and everything worked in my test notebook but failed when I built them into a pangeo-forge recipe:
SystemError: <class 'rasterio._err.CPLE_OpenFailedError'> returned a result with an exception set [while running 'Create|OpenURLWithFSSpec|OpenWithXarray|Preprocess|StoreToZarr/OpenWithXarray/Open with Xarray']
In the end, I was able to work around and open the tiffs directly with rioxarray (recipe.py); however, I believe it would be better if the recipes worked as intended.
Here's an example that will generate the error. I believe I have isolated the problem to OpenURLWithFSSpec, because avoiding OpenWithXarray will yield the same error, so the problem seems to be with the former or with xarray.
from datetime import date
import apache_beam as beam
import pandas as pd
import xarray as xr
from pangeo_forge_recipes.patterns import ConcatDim, FilePattern
from pangeo_forge_recipes.transforms import Indexed, OpenURLWithFSSpec, OpenWithXarray, StoreToZarr, T
# note the the filepattern differs from the working example
input_url_pattern = (
'https://edcintl.cr.usgs.gov/downloads/sciweb1/shared/uswem/web/'
'conus/eta/modis_eta/daily/downloads/'
'det{yyyyjjj}.modisSSEBopETactual.zip'
)
start = date(2001, 1, 1)
end = date(2022, 10, 7)
dates = pd.date_range(start, end, freq='1D')
def make_url(time: pd.Timestamp) -> str:
return input_url_pattern.format(yyyyjjj=time.strftime('%Y%j'))
pattern = FilePattern(make_url,
ConcatDim(name='time', keys=dates, nitems_per_file=1))
pattern = pattern.prune()
class Preprocess(beam.PTransform):
"""Preprocessor transform."""
@staticmethod
def _preproc(item: Indexed[T]) -> Indexed[xr.Dataset]:
import numpy as np
index, f = item
time_dim = index.find_concat_dim('time')
time_index = index[time_dim].value
time = dates[time_index]
da = rioxarray.open_rasterio(f.open()).drop('band')
da = da.rename({'x': 'lon', 'y': 'lat'})
ds = da.to_dataset(name='aet')
ds = ds.expand_dims(time=np.array([time]))
return index, ds
def expand(self, pcoll: beam.PCollection) -> beam.PCollection:
return pcoll | beam.Map(self._preproc)
recipe = (
beam.Create(pattern.items())
| OpenURLWithFSSpec(open_kwargs={'compression': 'zip'})
| OpenWithXarray(xarray_open_kwargs={'engine': 'rasterio'})
| Preprocess()
| StoreToZarr(
store_name='us-ssebop.zarr',
target_root='.',
combine_dims=pattern.combine_dim_keys,
target_chunks={'time': 1, 'lat': int(2834 / 2), 'lon': int(6612 / 6)},
)
)
with beam.Pipeline() as p:
p | recipe
The text was updated successfully, but these errors were encountered:
I was unable to open a collection of zipped tiffs using the OpenURLWithFSSpec recipe.
I tested the semantics with
fsspec
andxarray
, and everything worked in my test notebook but failed when I built them into a pangeo-forge recipe:SystemError: <class 'rasterio._err.CPLE_OpenFailedError'> returned a result with an exception set [while running 'Create|OpenURLWithFSSpec|OpenWithXarray|Preprocess|StoreToZarr/OpenWithXarray/Open with Xarray']
In the end, I was able to work around and open the tiffs directly with rioxarray (recipe.py); however, I believe it would be better if the recipes worked as intended.
Here's an example that will generate the error. I believe I have isolated the problem to OpenURLWithFSSpec, because avoiding OpenWithXarray will yield the same error, so the problem seems to be with the former or with
xarray
.The text was updated successfully, but these errors were encountered: