Skip to content
This repository has been archived by the owner on Feb 14, 2020. It is now read-only.

Support shape and slicing syntax for numpy compatibility #10

Open
mrocklin opened this issue Apr 30, 2018 · 1 comment
Open

Support shape and slicing syntax for numpy compatibility #10

mrocklin opened this issue Apr 30, 2018 · 1 comment

Comments

@mrocklin
Copy link

Cool project. I gave it a shot with an eye towards using it with dask arrays. I have some feedback on the numpy slicing protocol.

A common API for array storage technologies is to mimic Numpy slicing syntax:

>>> array[:5, ::2, 100]
... my numpy array ... 

I'm glad to see that diced supports much of this API. This makes it much easier to interact with with other libraries. After looking through the README and trying things out I got as far as the following:

from diced import DicedStore
store = DicedStore("gs://flyem-public-connectome")
repo = store.open_repo("medulla-training")
array = repo.get_array('training2-grayscale')

>>> array[0, 0, 0:5:1]
array([ 89,  95, 103, 103,  89], dtype=uint8)

>>> array.dtype
<ArrayDtype.uint8: <type 'numpy.uint8'>>

This is great to see! Some critical feedback:

  1. It would be good to add array.shape as well
  2. It would be useful if the dtype object was actually just a numpy dtype, rather than a custom diced-specific type
  3. Slicing only works if all dimensions are specified and the elements of the slices are specified explicitly
In [21]: array[0]
---------------------------------------------------------------------------
DicedException                            Traceback (most recent call last)
<ipython-input-21-bdd46aa5d024> in <module>()
----> 1 array[0]

/home/mrocklin/Software/anaconda/envs/diced/lib/python2.7/site-packages/diced/DicedArray.pyc in __getitem__(self, index)
    179 
    180         if self.numdims != dimsreq:
--> 181             raise DicedException("Array has a different number of dimensions than requested")
    182 
    183         z = y = x = slice(0,1)

DicedException: Array has a different number of dimensions than requested

In [22]: array[0, 0, :5]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-ab665471e927> in <module>()
----> 1 array[0, 0, :5]

/home/mrocklin/Software/anaconda/envs/diced/lib/python2.7/site-packages/diced/DicedArray.pyc in __getitem__(self, index)
    195         zsize = z.stop - z.start
    196         ysize = y.stop - y.start
--> 197         xsize = x.stop - x.start
    198         if zsize*ysize*xsize > self.MAX_REQ_SIZE:
    199             data = np.zeros((zsize, ysize, xsize), self.dtype.value)

TypeError: unsupported operand type(s) for -: 'int' and 'NoneType'

For reference this interface of dtype, shape, and slicing is supported by h5py, netcdf4, zarr, and most other array storage technologies in Python. This has allowed other projects (like Dask) to these formats without having to special case them (docs here)

@stephenplaza
Copy link
Contributor

Thanks for the feedback! It should be straightforward for me to add '.shape' at least.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants