Merge pull request #153 from rafaqz/refactor

Refactor: @generated, selectors and docs
rafaqz · Aug 24, 2020 · 9346a10 · 9346a10
2 parents c211ab6 + 88b209a
commit 9346a10
Show file tree

Hide file tree

Showing 25 changed files with 1,980 additions and 1,055 deletions.
diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
 name = "DimensionalData"
 uuid = "0703355e-b756-11e9-17c0-8b28908087d0"
 authors = ["Rafael Schouten <[email protected]>"]
-version = "0.11.2"
+version = "0.12.0"
 
 [deps]
 ConstructionBase = "187b0558-2788-49d3-abe0-74a17ed4e7c9"

diff --git a/README.md b/README.md
@@ -14,12 +14,6 @@ syntax, and additional functionality found in NamedDimensions.jl. It has similar
 goals to pythons [xarray](http://xarray.pydata.org/en/stable/), and is primarily
 written for use with spatial data in [GeoData.jl](https://github.com/rafaqz/GeoData.jl).
 
-:exclamation: | Status
-:-----------: | :-------
-
-    This is a work in progress under active development, it may be a while before
-    the interface stabilises and things are fully documented.
-
 
 ## Dimensions
 
@@ -35,7 +29,31 @@ We can use dim wrappers for indexing, so that the dimension order in the underly
 does not need to be known:
 
 ```julia
-a[X(1:10), Y(1:4)]
+julia> using DimensionalData
+
+julia> A = DimArray(rand(40, 50), (X, Y));
+
+julia> A[Y(1), X(1:10)]
+DimArray with dimensions:
+ X: 1:10 (NoIndex)
+and referenced dimensions:
+ Y: 1 (NoIndex)
+and data: 10-element Array{Float64,1}
+[0.515774, 0.575247, 0.429075, 0.234041, 0.4484, 0.302562, 0.911098, 0.541537, 0.267234, 0.370663]
+```
+
+And this has no runtime cost:
+
+```julia
+julia> using BenchmarkTools
+
+julia> @btime A[X(1), Y(2)]
+  25.068 ns (1 allocation: 16 bytes)
+0.7302366320496405
+
+julia> @btime parent(A)[1, 2]
+  34.061 ns (1 allocation: 16 bytes)
+0.7302366320496405
 ```
 
 The core component is the `AbstractDimension`, and types that inherit from it,
@@ -45,16 +63,68 @@ define manually using the `@dim` macro.
 Dims can be used for indexing and views without knowing dimension order:
 
 ```julia
-a[X(20)]
-view(a, X(1:20), Y(30:40))
+A[X(10)]
+view(A, Y(30:40), X(1:20))
 ```
 
 And for indicating dimensions to reduce or permute in julia
 `Base` and `Statistics` functions that have dims arguments:
 
 ```julia
-`mean(a, dims=Time)` 
-`permutedims(a, [X, Y, Z, Time])` 
+using Statistics
+
+A = DimArray(rand(10, 10, 100), (X, Y, Ti));
+mean(A, dims=Ti)
+permutedims(A, [Ti, Y, X]) 
+```
+
+You can also use arbitrary symbol to create `Dim{X}` dimensions:
+
+
+```julia
+julia> A = DimArray(rand(10, 20, 30), (:a, :b, :c));
+
+julia> A[a=2:5, c=9]
+
+DimArray with dimensions:
+ Dim{:a}: 2:5 (NoIndex)
+ Dim{:b}: Base.OneTo(20) (NoIndex)
+and referenced dimensions:
+ Dim{:c}: 9 (NoIndex)
+and data: 4×20 Array{Float64,2}
+ 0.868237   0.528297   0.32389   …  0.89322   0.6776    0.604891
+ 0.635544   0.0526766  0.965727     0.50829   0.661853  0.410173
+ 0.732377   0.990363   0.728461     0.610426  0.283663  0.00224321
+ 0.0849853  0.554705   0.594263     0.217618  0.198165  0.661853
+```
+
+Other methods also work:
+
+```julia
+julia> bounds(A, (:b, :c))
+
+((1, 20), (1, 30))
+
+julia> mean(A, dim=Dim{:b})
+
+julia> mean(A, dims=Dim{:b})
+DimArray with dimensions:
+ Dim{:a}: Base.OneTo(10) (NoIndex)
+ Dim{:b}: 1 (NoIndex)
+ Dim{:c}: Base.OneTo(30) (NoIndex)
+and data: 10×1×30 Array{Float64,3}
+[:, :, 1]
+ 0.543099
+ 0.542407
+ 0.540647
+ 0.513554
+ 0.601689
+ 0.601558
+ 0.46997
+ 0.524254
+ 0.601844
+ 0.520966
+[and 29 more slices...]
 ```
 
 
@@ -75,18 +145,27 @@ Selectors find indices in the dimension based on values `At`, `Near`, or
 We can use selectors with dim wrappers:
 
 ```julia
-a[X(Between(1, 10)), Y(At(25.7))]
+A[X(Between(1, 10)), Y(At(25.7))]
 ```
 
 Without dim wrappers selectors must be in the right order:
 
 ```julia
 using Unitful
-a[Near(23u"s"), Between(10.5u"m", 50.5u"m")]
+
+julia> A = DimArray(rand(10, 20), (X((1:10:100)u"m"), Ti((1:5:100)u"s")))
+
+julia> A[Between(10.5u"m", 50.5u"m"), Near(23u"s")]
+DimArray with dimensions:
+ X: (11:10:41) m (Sampled: Ordered Regular Points)
+and referenced dimensions:
+ Time (type Ti): 21 s (Sampled: Ordered Regular Points)
+and data: 4-element Array{Float64,1}
+[0.819172, 0.418113, 0.461722, 0.379877]
 ```
 
-For values other than `Int`/`AbstractArray`/`Colon` (which are set aside for regular indexing) the `At`
-selector is assumed, and can be dropped completely:
+For values other than `Int`/`AbstractArray`/`Colon` (which are set aside for 
+regular indexing) the `At` selector is assumed, and can be dropped completely:
 
 ```julia
 julia> A = DimArray(rand(3, 3), (X(Val((:a, :b, :c))), Y([25.6, 25.7, 25.8])))
@@ -102,6 +181,33 @@ julia> A[:b, 25.8]
 0.61839141062599
 ```
 
+Using all `Val` indexes (only recommended for small arrays)
+you can index with named dimensions `At` arbitrary values with no 
+runtime cost:
+
+
+```julia
+using BenchmarkTools
+
+julia> A = DimArray(rand(3, 3), (cat=Val((:a, :b, :c)), 
+                                 val=Val((5.0, 6.0, 7.0))))
+DimArray with dimensions:
+ Dim{:cat}: Val{(:a, :b, :c)}() (Categorical: Unordered)
+ Dim{:val}: Val{(5.0, 6.0, 7.0)}() (Categorical: Unordered)
+and data: 3×3 Array{Float64,2}
+ 0.993357  0.765515  0.914423
+ 0.405196  0.98223   0.330779
+ 0.365312  0.388873  0.88732
+
+julia> @btime A[:a, 7.0]
+  26.333 ns (1 allocation: 16 bytes)
+0.32927504968939925
+
+julia> @btime A[cat=:a, val=7.0]
+  31.920 ns (2 allocations: 48 bytes)
+0.7476441117572306
+````
+
 It's also easy to write your own custom `Selector` if your need a different behaviour.
 
 _Example usage:_
@@ -141,6 +247,7 @@ Base and Statistics methods:
 - `mean`, `median`, `extrema`, `std`, `var`, `cor`, `cov`
 - `permutedims`, `adjoint`, `transpose`, `Transpose`
 - `mapslices`, `eachslice`
+- `fill`
 
 _Example usage:_
 

diff --git a/docs/src/course.md b/docs/src/course.md
@@ -20,29 +20,42 @@ A `DimArray` with labelled dimensions is constructed by:
 ```@example main
 using DimensionalData
 A = DimArray(rand(5, 5), (X, Y))
+A[Y(1), X(2)]
 ```
 
-But often we want to provide values for the dimension.
+Or we can use the `Dim{X}` dim by using `Symbol`s:
+
+```@example main
+A = DimArray(rand(5, 5), (:a, :b))
+A[a=3, b=5]
+```
+
+But often, we want to provide a lookup index for the dimension:
 
 ```@example main
 using Dates
-t = Ti(DateTime(2001):Month(1):DateTime(2001,12))
-x = X(10:10:100)
+t = DateTime(2001):Month(1):DateTime(2001,12)
+x = 10:10:100
+A = DimArray(rand(12, 10), (X(x), Ti(t)))
 ```
 
 Here both `X` and `Ti` are dimensions from `DimensionalData`. The currently
-exported dimensions are `X, Y, Z, Ti`. `Ti` is shortening of `Time` -
-to avoid the conflict with `Dates.Time`.
+exported dimensions are `X, Y, Z, Ti`. `Ti` is shortening of `Time` - to avoid
+the conflict with `Dates.Time`.
 
-We pass a `Tuple` of the dimensions to the constructor of `DimArray`,
-after the array:
+The length of each dimension index has to match the size of the corresponding
+array axis. 
+
+This can also be done with `Symbol`, using `Dim{X}`:
 
 ```@example main
-A = DimArray(rand(12, 10), (t, x))
+A = DimArray(rand(12, 10), (:time=t, :distance=x))
 ```
 
-The length of each dimension index has to match the size of the corresponding
-array axis. 
+Symbols can be more convenient than defining dims with `@dim`, but have some
+downsides. They don't inherit from a specific `Dimension` type, so plots will
+not know what axis to put them on. If you need to specify the dimension `mode`
+or `metadata` manually, the `Dim{X}` syntax becomes less beneficial. 
 
 
 ## Indexing the array by name and index
@@ -115,17 +128,17 @@ standard `Array`:
 A[1:5]
 ```
 
-selects the first 5 entries of the underlying array. In the case that `A` has
-only one dimension, it will be retained. Multidimensional
-`AbstracDimArray` indexed this way will return a regular array.
+This selects the first 5 entries of the underlying array. In the case that `A`
+has only one dimension, it will be retained. Multidimensional `AbstracDimArray`
+indexed this way will return a regular array.
 
 
 
 ## Specifying `dims` keyword arguments with `Dimension`
 
-In many Julia functions like `size, sum`, you can specify the dimension along
-which to perform the operation, as an `Int`. It is also possible to do this
-using `Dimension` types with `AbstractDimArray`:
+In many Julia functions like `size` or `sum`, you can specify the dimension
+along which to perform the operation as an `Int`. It is also possible to do this
+using [`Dimension`](@ref) types with `AbstractDimArray`:
 
 ```@example main
 sum(A; dims=X)
@@ -158,17 +171,22 @@ changes.
 DimensionalData provides types for specifying details about the dimension index.
 This enables optimisations with `Selector`s, and modified behaviours such as
 selection of intervals or points, which will give slightly different results for
-selectors like [`Between`](@ref).
-
-The major categories are [`Categorical`](@ref), [`Sampled`](@ref) and
-[`NoIndex`](@ref), which are all types of [`Aligned`](@ref).
-[`Unaligned`](@ref) also exists to handle dimensions with an index that is
-rotated or otherwise transformed in relation to the underlying array, such as
-[`Transformed`](@ref). These are a work in progress.
-
-[`Aligned`] types will be detected automatically if not specified. A
-Dimension containing and index of `String`, `Char` or `Symbol` will be labelled
-with [`Categorical`](@ref). A range will be [`Sampled`](@ref),
-defaulting to [`Points`](@ref) and [`Regular`](@ref). 
+selectors like [`Between`](@ref) for [`Points`](@ref) and [`Intervals`](@ref).
+
+It also allows plots to always be the right way up when either the index or the 
+array is backwards - reverseing the data lazily when required for plotting if
+reqiured, not when loaded.
+
+The major categories of [`IndexMode`](@ref) are [`Categorical`](@ref),
+[`Sampled`](@ref) and [`NoIndex`](@ref), which are all subtypes of
+[`Aligned`](@ref). [`Unaligned`](@ref) also exists to handle dimensions with an
+index that is rotated or otherwise transformed in relation to the underlying
+array, such as [`Transformed`](@ref). These are a work in progress.
+
+[`Aligned`](@ref) types will be detected automatically if not specified. A
+Dimension containing and index of `String`, `Char` or `Symbol` will be given the
+[`Categorical`](@ref) mode. A range will be [`Sampled`](@ref), defaulting to
+[`Points`](@ref) and [`Regular`](@ref), with the [`Order`](@ref) detected
+automatically. 
 
 See the api docs for specifics about these [`IndexMode`](@ref)s.
diff --git a/docs/src/developer.md b/docs/src/developer.md
@@ -11,6 +11,8 @@
 - Minimal interface: implementing a dimension-aware type should be easy.
 - Functional style: structs are always rebuilt, and other than the array data,
   fields are not mutated in place.
+- Laziness. Label data correctly, and manipulate them when needed - 
+  instead of standardising eagerly.
 - Least surprise: everything works the same as in Base, but with named dims. If
   a method accepts numeric indices or `dims=X` in base, you should be able to
   use DimensionalData.jl dims.
@@ -45,22 +47,27 @@ packages and scripts.
 ### Syntax
 
 AxisArrays.jl is verbose by default: `a[Axis{:y}(1)]` vs `a[Y(1)]` used here.
-NamedDims.jl has concise syntax, but the dimensions are no longer types.
+NamedDims.jl has concise syntax, but the dimensions are no longer types,
+NamedDims.jl syntax can now be replicated using `Dim{:X}`: 
 
+```julia
+A = Dimarray(rand(4, 5), (:a, :b)
+A[:b=5, :a=3] = 25.0
+```
 
 ## Data types and the interface
 
-DimensionalData.jl provides the concrete `DimenstionalArray` type. But it's
+DimensionalData.jl provides the concrete `DimArray` type. But it's
 core purpose is to be easily used with other array types.
 
 Some of the functionality in DimensionalData.jl will work without inheriting
 from `AbstractDimArray`. The main requirement define a `dims` method
-that returns a `Tuple` of `AbstractDimension` that matches the dimension order
+that returns a `Tuple` of `Dimension` that matches the dimension order
 and axis values of your data. Define `rebuild`, and base methods for `similar`
 and `parent` if you want the metadata to persist through transformations (see
 the `DimArray` and `AbstractDimArray` types). A `refdims` method
 returns the lost dimensions of a previous transformation, passed in to the
 `rebuild` method. `refdims` can be discarded, the main loss being plot labels.
 
-Inheriting from `AbstractDimArray` will give all the functionality
+Inheriting from `AbstractDimArray` will give nearly all the functionality
 of using `DimArray`.
diff --git a/src/DimensionalData.jl b/src/DimensionalData.jl
@@ -42,19 +42,18 @@ export Unaligned, Transformed
 
 export AbstractDimArray, DimArray, AbstractDimensionalArray, DimensionalArray
 
-export data, dims, refdims, mode, metadata, name, shortname,
-       val, label, units, order, bounds, locus, mode, <|
+export data, dims, refdims, mode, metadata, name, shortname, label, units,
+       val, index, order, sampling, span, bounds, locus, relation, <|
 
-export dimnum, hasdim, otherdims, commondims, setdims, swapdims, rebuild, 
-       modify, dimwise, dimwise!
+export dimnum, hasdim, otherdims, commondims, setdims, swapdims, sortdims, 
+       rebuild, modify, dimwise, dimwise!
 
 export order, indexorder, arrayorder, 
        reverseindex, reversearray, reorderindex, 
        reorderarray, reorderrelation
 
 export @dim
 
-
 include("interface.jl")
 include("mode.jl")
 include("dimension.jl")
@@ -72,4 +71,6 @@ include("prettyprint.jl")
 const AbstractDimensionalArray = AbstractDimArray
 const DimensionalArray = DimArray
 
+const DD = DimensionalData
+
 end