Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when adding string array as a dimension coordinate #5

Open
dnerini opened this issue Jun 1, 2021 · 0 comments
Open

Error when adding string array as a dimension coordinate #5

dnerini opened this issue Jun 1, 2021 · 0 comments

Comments

@dnerini
Copy link
Collaborator

dnerini commented Jun 1, 2021

What happened:
Adding a string array as a dimension coordinate breaks the data serving through a web service.

Minimal Complete Verifiable Example:

import xarray as xr
import opendap_protocol as dap
import numpy as np

x = dap.Array(name='x', data=np.array([90, 91, 92]), dtype=dap.Int16)
y= dap.Array(name='y', data=np.array(['a', 'b', 'c']), dtype=dap.String)
data_array = dap.Grid(name='data', data=np.random.rand(3, 3), dtype=dap.Float64, dimensions=[x, y])
dataset = dap.Dataset(name='Example')
dataset.append(x, y, data_array)

import urllib
from flask import Flask, Response, request

app = Flask(__name__)

@app.route('/dataset.dds', methods=['GET'])
def dds_response():
    # Retrieve constraints from the request to handle slicing, etc.
    constraint = urllib.parse.urlsplit(request.url)[3]
    return Response(
        dataset.dds(constraint=constraint),
        mimetype='text/plain')

@app.route('/dataset.das', methods=['GET'])
def das_response():
    constraint = urllib.parse.urlsplit(request.url)[3]
    return Response(
        dataset.das(constraint=constraint),
        mimetype='text/plain')

@app.route('/dataset.dods', methods=['GET'])
def dods_response():
    constraint = urllib.parse.urlsplit(request.url)[3]
    return Response(
        dataset.dods(constraint=constraint),
        mimetype='application/octet-stream')

app.run(debug=True)

And from the terminal:

>>> import netCDF4 as nc
>>> data = nc.Dataset('http://localhost:5000/dataset')
>>> data
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_CLASSIC data model, file format DAP2):
    dimensions(sizes): maxStrlen64(64), x(3), y(3)
    variables(dimensions): int16 x(x), |S1 y(y,maxStrlen64), float64 data(x,y)
    groups: 
>>> data.variables
OrderedDict([('x', <class 'netCDF4._netCDF4.Variable'>
int16 x(x)
unlimited dimensions: 
current shape = (3,)
filling on, default _FillValue of -32767 used
), ('y', <class 'netCDF4._netCDF4.Variable'>
|S1 y(y, maxStrlen64)
unlimited dimensions: 
current shape = (3, 64)
filling on, default _FillValue of  used
), ('data', <class 'netCDF4._netCDF4.Variable'>
float64 data(x, y)
unlimited dimensions: 
current shape = (3, 3)
filling on, default _FillValue of 9.969209968386869e+36 used
)])

So far so good, but when trying to fetch any of the actual data, an IndexError is raised:

>>> data.variables["x"][...]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "netCDF4/_netCDF4.pyx", line 4351, in netCDF4._netCDF4.Variable.__getitem__
  File "netCDF4/_netCDF4.pyx", line 5291, in netCDF4._netCDF4.Variable._get
IndexError: index exceeds dimension bounds
>>> data.variables["y"][...]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "netCDF4/_netCDF4.pyx", line 4351, in netCDF4._netCDF4.Variable.__getitem__
  File "netCDF4/_netCDF4.pyx", line 5291, in netCDF4._netCDF4.Variable._get
IndexError: index exceeds dimension bounds
>>> data.variables["data"][...]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "netCDF4/_netCDF4.pyx", line 4351, in netCDF4._netCDF4.Variable.__getitem__
  File "netCDF4/_netCDF4.pyx", line 5291, in netCDF4._netCDF4.Variable._get
IndexError: index exceeds dimension bounds

As a consequence, xarray also fails to read the dataset:

>>> import xarray as xr
>>> xr.open_dataset('http://localhost:5000/dataset')
Traceback (most recent call last):
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/backends/netCDF4_.py", line 104, in _getitem
    array = getitem(original_array, key)
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/backends/common.py", line 65, in robust_getitem
    return array[key]
  File "netCDF4/_netCDF4.pyx", line 4351, in netCDF4._netCDF4.Variable.__getitem__
  File "netCDF4/_netCDF4.pyx", line 5291, in netCDF4._netCDF4.Variable._get
IndexError: index exceeds dimension bounds

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/backends/api.py", line 496, in open_dataset
    backend_ds = backend.open_dataset(
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/backends/netCDF4_.py", line 563, in open_dataset
    ds = store_entrypoint.open_dataset(
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/backends/store.py", line 37, in open_dataset
    ds = Dataset(vars, attrs=attrs)
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/core/dataset.py", line 739, in __init__
    variables, coord_names, dims, indexes, _ = merge_data_and_coords(
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/core/merge.py", line 477, in merge_data_and_coords
    return merge_core(
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/core/merge.py", line 623, in merge_core
    collected = collect_variables_and_indexes(aligned)
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/core/merge.py", line 287, in collect_variables_and_indexes
    variable = as_variable(variable, name=name)
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/core/variable.py", line 166, in as_variable
    obj = obj.to_index_variable()
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/core/variable.py", line 536, in to_index_variable
    return IndexVariable(
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/core/variable.py", line 2540, in __init__
    self._data = PandasIndex(self._data)
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/core/indexes.py", line 76, in __init__
    self.array = utils.safe_cast_to_index(array)
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/core/utils.py", line 114, in safe_cast_to_index
    index = pd.Index(np.asarray(array), **kwargs)
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/numpy/core/_asarray.py", line 102, in asarray
    return array(a, dtype, copy=False, order=order)
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/core/indexing.py", line 572, in __array__
    return np.asarray(array[self.key], dtype=None)
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/backends/netCDF4_.py", line 91, in __getitem__
    return indexing.explicit_indexing_adapter(
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/core/indexing.py", line 863, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)
  File "/prod/zue/fc_development/users/ned/local_share/virtualenvs/preproc-pipelines-HVmgC6OK/lib/python3.8/site-packages/xarray/backends/netCDF4_.py", line 114, in _getitem
    raise IndexError(msg)
IndexError: The indexing operation you are attempting to perform is not valid on netCDF4.Variable object. Try loading your data into memory first by calling .load().

What you expected to happen:

Through netcdf

>>> data = nc.Dataset('http://localhost:5000/dataset')
>>> data.variables["x"][...]
masked_array(data=[90, 91, 92],
             mask=False,
       fill_value=999999,
            dtype=int16)
>>> data.variables["y"][...]
masked_array(data=['a', 'b', 'c'],
             mask=False,
       fill_value='N/A',
            dtype='<U1')

and through xarray:

>>> xr.open_dataset('http://localhost:5000/dataset')
<xarray.Dataset>
Dimensions:  (x: 3, y: 3)
Coordinates:
  * x        (x) int16 90 91 92
  * y        (y) <U1 'a' 'b' 'c'
Data variables:
    data     (x, y) float64 ...

Environment:

INSTALLED VERSIONS
------------------
commit: None
python: 3.8.4 (default, Aug 21 2020, 11:28:17) 
[GCC 5.4.0 20160609]
python-bits: 64
OS: Linux
OS-release: 4.4.0-210-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.8.16
libnetcdf: 4.4.0

xarray: 0.18.2
pandas: 1.2.4
numpy: 1.20.3
scipy: 1.6.3
netCDF4: 1.5.0.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.8.3
cftime: 1.5.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.05.1
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 56.0.0
pip: 21.0.1
conda: None
pytest: None
IPython: None
sphinx: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant