Skip to content

Commit

Permalink
Merge pull request #153 from rafaqz/refactor
Browse files Browse the repository at this point in the history
Refactor: @generated, selectors and docs
  • Loading branch information
rafaqz authored Aug 24, 2020
2 parents c211ab6 + 88b209a commit 9346a10
Show file tree
Hide file tree
Showing 25 changed files with 1,980 additions and 1,055 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "DimensionalData"
uuid = "0703355e-b756-11e9-17c0-8b28908087d0"
authors = ["Rafael Schouten <[email protected]>"]
version = "0.11.2"
version = "0.12.0"

[deps]
ConstructionBase = "187b0558-2788-49d3-abe0-74a17ed4e7c9"
Expand Down
137 changes: 122 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,6 @@ syntax, and additional functionality found in NamedDimensions.jl. It has similar
goals to pythons [xarray](http://xarray.pydata.org/en/stable/), and is primarily
written for use with spatial data in [GeoData.jl](https://github.com/rafaqz/GeoData.jl).

:exclamation: | Status
:-----------: | :-------

This is a work in progress under active development, it may be a while before
the interface stabilises and things are fully documented.


## Dimensions

Expand All @@ -35,7 +29,31 @@ We can use dim wrappers for indexing, so that the dimension order in the underly
does not need to be known:

```julia
a[X(1:10), Y(1:4)]
julia> using DimensionalData

julia> A = DimArray(rand(40, 50), (X, Y));

julia> A[Y(1), X(1:10)]
DimArray with dimensions:
X: 1:10 (NoIndex)
and referenced dimensions:
Y: 1 (NoIndex)
and data: 10-element Array{Float64,1}
[0.515774, 0.575247, 0.429075, 0.234041, 0.4484, 0.302562, 0.911098, 0.541537, 0.267234, 0.370663]
```

And this has no runtime cost:

```julia
julia> using BenchmarkTools

julia> @btime A[X(1), Y(2)]
25.068 ns (1 allocation: 16 bytes)
0.7302366320496405

julia> @btime parent(A)[1, 2]
34.061 ns (1 allocation: 16 bytes)
0.7302366320496405
```

The core component is the `AbstractDimension`, and types that inherit from it,
Expand All @@ -45,16 +63,68 @@ define manually using the `@dim` macro.
Dims can be used for indexing and views without knowing dimension order:

```julia
a[X(20)]
view(a, X(1:20), Y(30:40))
A[X(10)]
view(A, Y(30:40), X(1:20))
```

And for indicating dimensions to reduce or permute in julia
`Base` and `Statistics` functions that have dims arguments:

```julia
`mean(a, dims=Time)`
`permutedims(a, [X, Y, Z, Time])`
using Statistics

A = DimArray(rand(10, 10, 100), (X, Y, Ti));
mean(A, dims=Ti)
permutedims(A, [Ti, Y, X])
```

You can also use arbitrary symbol to create `Dim{X}` dimensions:


```julia
julia> A = DimArray(rand(10, 20, 30), (:a, :b, :c));

julia> A[a=2:5, c=9]

DimArray with dimensions:
Dim{:a}: 2:5 (NoIndex)
Dim{:b}: Base.OneTo(20) (NoIndex)
and referenced dimensions:
Dim{:c}: 9 (NoIndex)
and data: 4×20 Array{Float64,2}
0.868237 0.528297 0.32389 0.89322 0.6776 0.604891
0.635544 0.0526766 0.965727 0.50829 0.661853 0.410173
0.732377 0.990363 0.728461 0.610426 0.283663 0.00224321
0.0849853 0.554705 0.594263 0.217618 0.198165 0.661853
```

Other methods also work:

```julia
julia> bounds(A, (:b, :c))

((1, 20), (1, 30))

julia> mean(A, dim=Dim{:b})

julia> mean(A, dims=Dim{:b})
DimArray with dimensions:
Dim{:a}: Base.OneTo(10) (NoIndex)
Dim{:b}: 1 (NoIndex)
Dim{:c}: Base.OneTo(30) (NoIndex)
and data: 10×1×30 Array{Float64,3}
[:, :, 1]
0.543099
0.542407
0.540647
0.513554
0.601689
0.601558
0.46997
0.524254
0.601844
0.520966
[and 29 more slices...]
```


Expand All @@ -75,18 +145,27 @@ Selectors find indices in the dimension based on values `At`, `Near`, or
We can use selectors with dim wrappers:

```julia
a[X(Between(1, 10)), Y(At(25.7))]
A[X(Between(1, 10)), Y(At(25.7))]
```

Without dim wrappers selectors must be in the right order:

```julia
using Unitful
a[Near(23u"s"), Between(10.5u"m", 50.5u"m")]

julia> A = DimArray(rand(10, 20), (X((1:10:100)u"m"), Ti((1:5:100)u"s")))

julia> A[Between(10.5u"m", 50.5u"m"), Near(23u"s")]
DimArray with dimensions:
X: (11:10:41) m (Sampled: Ordered Regular Points)
and referenced dimensions:
Time (type Ti): 21 s (Sampled: Ordered Regular Points)
and data: 4-element Array{Float64,1}
[0.819172, 0.418113, 0.461722, 0.379877]
```

For values other than `Int`/`AbstractArray`/`Colon` (which are set aside for regular indexing) the `At`
selector is assumed, and can be dropped completely:
For values other than `Int`/`AbstractArray`/`Colon` (which are set aside for
regular indexing) the `At` selector is assumed, and can be dropped completely:

```julia
julia> A = DimArray(rand(3, 3), (X(Val((:a, :b, :c))), Y([25.6, 25.7, 25.8])))
Expand All @@ -102,6 +181,33 @@ julia> A[:b, 25.8]
0.61839141062599
```

Using all `Val` indexes (only recommended for small arrays)
you can index with named dimensions `At` arbitrary values with no
runtime cost:


```julia
using BenchmarkTools

julia> A = DimArray(rand(3, 3), (cat=Val((:a, :b, :c)),
val=Val((5.0, 6.0, 7.0))))
DimArray with dimensions:
Dim{:cat}: Val{(:a, :b, :c)}() (Categorical: Unordered)
Dim{:val}: Val{(5.0, 6.0, 7.0)}() (Categorical: Unordered)
and data: 3×3 Array{Float64,2}
0.993357 0.765515 0.914423
0.405196 0.98223 0.330779
0.365312 0.388873 0.88732

julia> @btime A[:a, 7.0]
26.333 ns (1 allocation: 16 bytes)
0.32927504968939925

julia> @btime A[cat=:a, val=7.0]
31.920 ns (2 allocations: 48 bytes)
0.7476441117572306
````

It's also easy to write your own custom `Selector` if your need a different behaviour.

_Example usage:_
Expand Down Expand Up @@ -141,6 +247,7 @@ Base and Statistics methods:
- `mean`, `median`, `extrema`, `std`, `var`, `cor`, `cov`
- `permutedims`, `adjoint`, `transpose`, `Transpose`
- `mapslices`, `eachslice`
- `fill`

_Example usage:_

Expand Down
74 changes: 46 additions & 28 deletions docs/src/course.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,29 +20,42 @@ A `DimArray` with labelled dimensions is constructed by:
```@example main
using DimensionalData
A = DimArray(rand(5, 5), (X, Y))
A[Y(1), X(2)]
```

But often we want to provide values for the dimension.
Or we can use the `Dim{X}` dim by using `Symbol`s:

```@example main
A = DimArray(rand(5, 5), (:a, :b))
A[a=3, b=5]
```

But often, we want to provide a lookup index for the dimension:

```@example main
using Dates
t = Ti(DateTime(2001):Month(1):DateTime(2001,12))
x = X(10:10:100)
t = DateTime(2001):Month(1):DateTime(2001,12)
x = 10:10:100
A = DimArray(rand(12, 10), (X(x), Ti(t)))
```

Here both `X` and `Ti` are dimensions from `DimensionalData`. The currently
exported dimensions are `X, Y, Z, Ti`. `Ti` is shortening of `Time` -
to avoid the conflict with `Dates.Time`.
exported dimensions are `X, Y, Z, Ti`. `Ti` is shortening of `Time` - to avoid
the conflict with `Dates.Time`.

We pass a `Tuple` of the dimensions to the constructor of `DimArray`,
after the array:
The length of each dimension index has to match the size of the corresponding
array axis.

This can also be done with `Symbol`, using `Dim{X}`:

```@example main
A = DimArray(rand(12, 10), (t, x))
A = DimArray(rand(12, 10), (:time=t, :distance=x))
```

The length of each dimension index has to match the size of the corresponding
array axis.
Symbols can be more convenient than defining dims with `@dim`, but have some
downsides. They don't inherit from a specific `Dimension` type, so plots will
not know what axis to put them on. If you need to specify the dimension `mode`
or `metadata` manually, the `Dim{X}` syntax becomes less beneficial.


## Indexing the array by name and index
Expand Down Expand Up @@ -115,17 +128,17 @@ standard `Array`:
A[1:5]
```

selects the first 5 entries of the underlying array. In the case that `A` has
only one dimension, it will be retained. Multidimensional
`AbstracDimArray` indexed this way will return a regular array.
This selects the first 5 entries of the underlying array. In the case that `A`
has only one dimension, it will be retained. Multidimensional `AbstracDimArray`
indexed this way will return a regular array.



## Specifying `dims` keyword arguments with `Dimension`

In many Julia functions like `size, sum`, you can specify the dimension along
which to perform the operation, as an `Int`. It is also possible to do this
using `Dimension` types with `AbstractDimArray`:
In many Julia functions like `size` or `sum`, you can specify the dimension
along which to perform the operation as an `Int`. It is also possible to do this
using [`Dimension`](@ref) types with `AbstractDimArray`:

```@example main
sum(A; dims=X)
Expand Down Expand Up @@ -158,17 +171,22 @@ changes.
DimensionalData provides types for specifying details about the dimension index.
This enables optimisations with `Selector`s, and modified behaviours such as
selection of intervals or points, which will give slightly different results for
selectors like [`Between`](@ref).

The major categories are [`Categorical`](@ref), [`Sampled`](@ref) and
[`NoIndex`](@ref), which are all types of [`Aligned`](@ref).
[`Unaligned`](@ref) also exists to handle dimensions with an index that is
rotated or otherwise transformed in relation to the underlying array, such as
[`Transformed`](@ref). These are a work in progress.

[`Aligned`] types will be detected automatically if not specified. A
Dimension containing and index of `String`, `Char` or `Symbol` will be labelled
with [`Categorical`](@ref). A range will be [`Sampled`](@ref),
defaulting to [`Points`](@ref) and [`Regular`](@ref).
selectors like [`Between`](@ref) for [`Points`](@ref) and [`Intervals`](@ref).

It also allows plots to always be the right way up when either the index or the
array is backwards - reverseing the data lazily when required for plotting if
reqiured, not when loaded.

The major categories of [`IndexMode`](@ref) are [`Categorical`](@ref),
[`Sampled`](@ref) and [`NoIndex`](@ref), which are all subtypes of
[`Aligned`](@ref). [`Unaligned`](@ref) also exists to handle dimensions with an
index that is rotated or otherwise transformed in relation to the underlying
array, such as [`Transformed`](@ref). These are a work in progress.

[`Aligned`](@ref) types will be detected automatically if not specified. A
Dimension containing and index of `String`, `Char` or `Symbol` will be given the
[`Categorical`](@ref) mode. A range will be [`Sampled`](@ref), defaulting to
[`Points`](@ref) and [`Regular`](@ref), with the [`Order`](@ref) detected
automatically.

See the api docs for specifics about these [`IndexMode`](@ref)s.
15 changes: 11 additions & 4 deletions docs/src/developer.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
- Minimal interface: implementing a dimension-aware type should be easy.
- Functional style: structs are always rebuilt, and other than the array data,
fields are not mutated in place.
- Laziness. Label data correctly, and manipulate them when needed -
instead of standardising eagerly.
- Least surprise: everything works the same as in Base, but with named dims. If
a method accepts numeric indices or `dims=X` in base, you should be able to
use DimensionalData.jl dims.
Expand Down Expand Up @@ -45,22 +47,27 @@ packages and scripts.
### Syntax

AxisArrays.jl is verbose by default: `a[Axis{:y}(1)]` vs `a[Y(1)]` used here.
NamedDims.jl has concise syntax, but the dimensions are no longer types.
NamedDims.jl has concise syntax, but the dimensions are no longer types,
NamedDims.jl syntax can now be replicated using `Dim{:X}`:

```julia
A = Dimarray(rand(4, 5), (:a, :b)
A[:b=5, :a=3] = 25.0
```
## Data types and the interface
DimensionalData.jl provides the concrete `DimenstionalArray` type. But it's
DimensionalData.jl provides the concrete `DimArray` type. But it's
core purpose is to be easily used with other array types.
Some of the functionality in DimensionalData.jl will work without inheriting
from `AbstractDimArray`. The main requirement define a `dims` method
that returns a `Tuple` of `AbstractDimension` that matches the dimension order
that returns a `Tuple` of `Dimension` that matches the dimension order
and axis values of your data. Define `rebuild`, and base methods for `similar`
and `parent` if you want the metadata to persist through transformations (see
the `DimArray` and `AbstractDimArray` types). A `refdims` method
returns the lost dimensions of a previous transformation, passed in to the
`rebuild` method. `refdims` can be discarded, the main loss being plot labels.
Inheriting from `AbstractDimArray` will give all the functionality
Inheriting from `AbstractDimArray` will give nearly all the functionality
of using `DimArray`.
11 changes: 6 additions & 5 deletions src/DimensionalData.jl
Original file line number Diff line number Diff line change
Expand Up @@ -42,19 +42,18 @@ export Unaligned, Transformed

export AbstractDimArray, DimArray, AbstractDimensionalArray, DimensionalArray

export data, dims, refdims, mode, metadata, name, shortname,
val, label, units, order, bounds, locus, mode, <|
export data, dims, refdims, mode, metadata, name, shortname, label, units,
val, index, order, sampling, span, bounds, locus, relation, <|

export dimnum, hasdim, otherdims, commondims, setdims, swapdims, rebuild,
modify, dimwise, dimwise!
export dimnum, hasdim, otherdims, commondims, setdims, swapdims, sortdims,
rebuild, modify, dimwise, dimwise!

export order, indexorder, arrayorder,
reverseindex, reversearray, reorderindex,
reorderarray, reorderrelation

export @dim


include("interface.jl")
include("mode.jl")
include("dimension.jl")
Expand All @@ -72,4 +71,6 @@ include("prettyprint.jl")
const AbstractDimensionalArray = AbstractDimArray
const DimensionalArray = DimArray

const DD = DimensionalData

end
Loading

0 comments on commit 9346a10

Please sign in to comment.