Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Materialize DimArray or DimStack From a Table #739

Open
wants to merge 40 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
60256a0
Table Materializer Methods
JoshuaBillson Jun 18, 2024
3526b96
Merged Main
JoshuaBillson Jun 18, 2024
eab2fa0
Made col Optional for DimArray
JoshuaBillson Jun 18, 2024
d4892df
Apply suggestions from code review
JoshuaBillson Jun 20, 2024
ea6751a
Handle coordinates with different loci
JoshuaBillson Jun 20, 2024
13c80da
Merge branch 'materialize' of github.com:JoshuaBillson/DimensionalDat…
JoshuaBillson Jun 20, 2024
6a9d26e
replaced At() with Contains() in _coords_to_ords
JoshuaBillson Jun 20, 2024
9164c22
Added optional selectors and public methods for table materializer
JoshuaBillson Jun 25, 2024
2ebec1c
Updated table constructors for DimArray and DimStack
JoshuaBillson Jun 25, 2024
8e791bf
Updated DimArray and DimStack docs to include table materializer methods
JoshuaBillson Jul 5, 2024
4cd5f9d
Table materializer test cases
JoshuaBillson Jul 5, 2024
0c1991a
export table materializer methods
JoshuaBillson Jul 5, 2024
8758ba9
Merge branch 'rafaqz:main' into materialize
JoshuaBillson Jul 5, 2024
4534de5
Added Random to tables.jl test cases
JoshuaBillson Jul 5, 2024
119fa30
Merge branch 'rafaqz:main' into materialize
JoshuaBillson Aug 8, 2024
ed395ca
Update src/array/array.jl
JoshuaBillson Aug 8, 2024
00336af
Update src/table_ops.jl
JoshuaBillson Aug 8, 2024
532f887
Removed exports
JoshuaBillson Aug 8, 2024
c98dcb0
Merge branch 'materialize' of github.com:JoshuaBillson/DimensionalDat…
JoshuaBillson Aug 8, 2024
06a2c91
Update src/table_ops.jl
JoshuaBillson Aug 8, 2024
3bacf33
Update src/table_ops.jl
JoshuaBillson Aug 8, 2024
4ced6f7
Update src/table_ops.jl
JoshuaBillson Aug 8, 2024
c846dfd
Update src/table_ops.jl
JoshuaBillson Aug 8, 2024
fe2c871
Update src/table_ops.jl
JoshuaBillson Aug 8, 2024
61f8220
Replaced selector type with instance.
JoshuaBillson Aug 8, 2024
3d28b43
Merge branch 'materialize' of github.com:JoshuaBillson/DimensionalDat…
JoshuaBillson Aug 8, 2024
dbe7b99
Table materializer can now infer dimensions from the coordinates.
JoshuaBillson Aug 12, 2024
f410988
Update src/stack/stack.jl
JoshuaBillson Sep 18, 2024
a17f069
Update src/table_ops.jl
JoshuaBillson Sep 18, 2024
9bdded9
Update src/table_ops.jl
JoshuaBillson Sep 18, 2024
5451087
Update src/table_ops.jl
JoshuaBillson Sep 18, 2024
faf4d76
Update src/table_ops.jl
JoshuaBillson Sep 18, 2024
02f60a3
Update src/table_ops.jl
JoshuaBillson Sep 18, 2024
fafd357
Update src/table_ops.jl
JoshuaBillson Sep 22, 2024
d7f15f5
Update src/array/array.jl
JoshuaBillson Sep 25, 2024
34a0a69
Update src/table_ops.jl
JoshuaBillson Sep 26, 2024
d0b9eb7
Added support for guessing the dimension ordering and span for Dates …
JoshuaBillson Sep 26, 2024
32b0c00
Merge branch 'materialize' of github.com:JoshuaBillson/DimensionalDat…
JoshuaBillson Sep 26, 2024
0ea72a0
Replaced LinRange with StepRangeLen in _build_dim
JoshuaBillson Sep 27, 2024
bc62932
Added Tables.istable check to DimArray constructor
JoshuaBillson Oct 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/DimensionalData.jl
Original file line number Diff line number Diff line change
Expand Up @@ -77,13 +77,17 @@ export dimnum, hasdim, hasselection, otherdims
# utils
export set, rebuild, reorder, modify, broadcast_dims, broadcast_dims!, mergedims, unmergedims

# table utils
export restore_array, coords_to_index
JoshuaBillson marked this conversation as resolved.
Show resolved Hide resolved

export groupby, seasons, months, hours, intervals, ranges

const DD = DimensionalData

# Common
include("interface.jl")
include("name.jl")
include("table_ops.jl")

# Arrays
include("array/array.jl")
Expand Down
12 changes: 10 additions & 2 deletions src/array/array.jl
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,7 @@ end
DimArray <: AbstractDimArray

DimArray(data, dims, refdims, name, metadata)
DimArray(data, dims::Tuple; refdims=(), name=NoName(), metadata=NoMetadata())
DimArray(data, dims::Tuple; refdims=(), name=NoName(), metadata=NoMetadata(), selector=Contains)
JoshuaBillson marked this conversation as resolved.
Show resolved Hide resolved

The main concrete subtype of [`AbstractDimArray`](@ref).

Expand All @@ -344,12 +344,13 @@ moves dimensions to reference dimension `refdims` after reducing operations

## Arguments

- `data`: An `AbstractArray`.
- `data`: An `AbstractArray` or a table with coordinate columns corresponding to `dims`.
- `dims`: A `Tuple` of `Dimension`
- `name`: A string name for the array. Shows in plots and tables.
- `refdims`: refence dimensions. Usually set programmatically to track past
slices and reductions of dimension for labelling and reconstruction.
- `metadata`: `Dict` or `Metadata` object, or `NoMetadata()`
- `selector`: The coordinate selector type to use when materializing from a table.

Indexing can be done with all regular indices, or with [`Dimension`](@ref)s
and/or [`Selector`](@ref)s.
Expand Down Expand Up @@ -429,6 +430,13 @@ function DimArray(A::AbstractBasicDimArray;
newdata = collect(data)
DimArray(newdata, format(dims, newdata); refdims, name, metadata)
end
# Write a single column from a table with one or more coordinate columns to a DimArray
function DimArray(table, dims; name=NoName(), selector=DimensionalData.Contains, kw...)
indices = coords_to_index(table, dims; selector=selector)
JoshuaBillson marked this conversation as resolved.
Show resolved Hide resolved
col = name == NoName() ? _data_col_names(table, dims) |> first : Symbol(name)
data = restore_array(Tables.getcolumn(table, col), indices, dims; missingval=missing)
return DimArray(data, dims, name=col; kw...)
end
"""
DimArray(f::Function, dim::Dimension; [name])

Expand Down
8 changes: 8 additions & 0 deletions src/stack/stack.jl
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,7 @@ end
"""
DimStack <: AbstractDimStack

DimStack(table, dims; kw...)
JoshuaBillson marked this conversation as resolved.
Show resolved Hide resolved
DimStack(data::AbstractDimArray...; kw...)
DimStack(data::Tuple{Vararg{AbstractDimArray}}; kw...)
DimStack(data::NamedTuple{Keys,Vararg{AbstractDimArray}}; kw...)
Expand Down Expand Up @@ -420,5 +421,12 @@ function DimStack(data::NamedTuple, dims::Tuple;
all(map(d -> axes(d) == axes(first(data)), data)) || _stack_size_mismatch()
DimStack(data, format(dims, first(data)), refdims, layerdims, metadata, layermetadata)
end
# Write each column from a table with one or more coordinate columns to a layer in a DimStack
function DimStack(table, dims::Tuple; selector=DimensionalData.Contains, kw...)
data_cols = _data_cols(table, dims)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again we probably need a Tables.istable check here

indices = coords_to_index(table, dims; selector=selector)
arrays = [restore_array(d, indices, dims; missingval=missing) for d in values(data_cols)]
return DimStack(NamedTuple{keys(data_cols)}(arrays), dims; kw...)
end

layerdims(s::DimStack{<:Any,<:Any,<:Any,<:Any,<:Any,<:Any,Nothing}, name::Symbol) = dims(s)
163 changes: 163 additions & 0 deletions src/table_ops.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
"""
restore_array(data, indices, dims; missingval=missing)

Restore a dimensional array from a set of values and their corresponding indices.

# Arguments
- `data`: An `AbstractVector` of values to write to the destination array.
- `indices`: The flat index of each value in `data`.
- `dims`: A `Tuple` of `Dimension` for the corresponding destination array.
- `missingval`: The value to store for missing indices.

# Example
```julia
julia> d = DimArray(rand(256, 256), (X, Y));

julia> t = DimTable(d);

julia> indices = coords_to_index(t, dims(d));

julia> restored = restore_array(Tables.getcolumn(t, :value), indices, dims(d));

julia> all(restored .== d)
true
```
"""
function restore_array(data::AbstractVector, indices::AbstractVector{<:Integer}, dims::Tuple; missingval=missing)
# Allocate Destination Array
dst_size = prod(map(length, dims))
dst = Vector{eltype(data)}(undef, dst_size)
dst[indices] .= data

# Handle Missing Rows
_missingval = _cast_missing(data, missingval)
missing_rows = ones(Bool, dst_size)
missing_rows[indices] .= false
data = ifelse.(missing_rows, _missingval, dst)

# Reshape Array
return reshape(data, size(dims))
end

"""
coords_to_index(table, dims; selector=Contains)

Return the flat index of each row in `table` based on its associated coordinates.
Dimension columns are determined from the name of each dimension in `dims`.
It is assumed that the source/destination array has the same dimension order as `dims`.

# Arguments
- `table`: A table representation of a dimensional array.
- `dims`: A `Tuple` of `Dimension` corresponding to the source/destination array.
- `selector`: The selector type to use for non-numerical/irregular coordinates.

# Example
```julia
julia> d = DimArray(rand(256, 256), (X, Y));

julia> t = DimTable(d);

julia> coords_to_index(t, dims(d))
65536-element Vector{Int64}:
1
2
65535
65536
```
"""
function coords_to_index(table, dims::Tuple; selector=DimensionalData.Contains)
JoshuaBillson marked this conversation as resolved.
Show resolved Hide resolved
return _sort_coords(table, dims, selector)
end

# Find the order of the table's rows according to the coordinate values
_sort_coords(table, dims::Tuple, ::Type{T}) where {T <: DimensionalData.Selector} = _sort_coords(_dim_cols(table, dims), dims, T)
function _sort_coords(coords::NamedTuple, dims::Tuple, ::Type{T}) where {T <: DimensionalData.Selector}
ords = _coords_to_ords(coords, dims, T)
indices = _ords_to_indices(ords, dims)
return indices
end

# Extract coordinate columns from table
function _dim_cols(table, dims::Tuple)
dim_cols = name(dims)
return NamedTuple{dim_cols}(Tables.getcolumn(table, col) for col in dim_cols)
end

# Extract data columns from table
function _data_cols(table, dims::Tuple)
data_cols = _data_col_names(table, dims)
return NamedTuple{Tuple(data_cols)}(Tables.getcolumn(table, col) for col in data_cols)
end

# Get names of data columns from table
function _data_col_names(table, dims::Tuple)
dim_cols = name(dims)
return filter(x -> !(x in dim_cols), Tables.columnnames(table))
end

# Determine the ordinality of a set of regularly spaced numerical coordinates with a starting locus
function _coords_to_ords(
coords::AbstractVector,
dim::Dimension,
::Type{<:DimensionalData.Selector},
JoshuaBillson marked this conversation as resolved.
Show resolved Hide resolved
::Type{<:Real},
::DimensionalData.Start,
::DimensionalData.Regular)
step = (last(dim) - first(dim)) / (length(dim) - 1)
return floor.(Int, ((coords .- first(dim)) ./ step) .+ 1)
end

JoshuaBillson marked this conversation as resolved.
Show resolved Hide resolved
# Determine the ordinality of a set of regularly spaced numerical coordinates with a central locus
function _coords_to_ords(
coords::AbstractVector,
dim::Dimension,
::Type{<:DimensionalData.Selector},
JoshuaBillson marked this conversation as resolved.
Show resolved Hide resolved
::Type{<:Real},
::DimensionalData.Center,
::DimensionalData.Regular)
step = (last(dim) - first(dim)) / (length(dim) - 1)
return round.(Int, ((coords .- first(dim)) ./ step) .+ 1)
end

# Determine the ordinality of a set of regularly spaced numerical coordinates with an end locus
function _coords_to_ords(
coords::AbstractVector,
dim::Dimension,
::Type{<:DimensionalData.Selector},
JoshuaBillson marked this conversation as resolved.
Show resolved Hide resolved
::Type{<:Real},
::DimensionalData.End,
::DimensionalData.Regular)
step = (last(dim) - first(dim)) / (length(dim) - 1)
return ceil.(Int, ((coords .- first(dim)) ./ step) .+ 1)
end

# Determine the ordinality of a set of categorical or irregular coordinates
function _coords_to_ords(coords::AbstractVector, dim::Dimension, ::Type{T}, ::Any, ::Any, ::Any) where {T<:DimensionalData.Selector}
JoshuaBillson marked this conversation as resolved.
Show resolved Hide resolved
return map(c -> DimensionalData.selectindices(dim, T(c)), coords)
JoshuaBillson marked this conversation as resolved.
Show resolved Hide resolved
end

# Determine the ordinality of a set of coordinates
_coords_to_ords(coords::AbstractVector, dim::Dimension, ::Type{T}) where {T <: DimensionalData.Selector} = _coords_to_ords(coords, dim, T, eltype(dim), locus(dim), span(dim))
_coords_to_ords(coords::Tuple, dims::Tuple, ::Type{T}) where {T <: DimensionalData.Selector} = Tuple(_coords_to_ords(c, d, T) for (c, d) in zip(coords, dims))
_coords_to_ords(coords::NamedTuple, dims::Tuple, ::Type{T}) where {T <: DimensionalData.Selector} = _coords_to_ords(map(x -> coords[x], name(dims)), dims, T)

# Determine the index from a tuple of coordinate orders
function _ords_to_indices(ords, dims)
stride = 1
indices = ones(Int, length(ords[1]))
for (ord, dim) in zip(ords, dims)
indices .+= (ord .- 1) .* stride
stride *= length(dim)
end
return indices
end

_cast_missing(::AbstractArray, missingval::Missing) = missing
function _cast_missing(::AbstractArray{T}, missingval) where {T}
JoshuaBillson marked this conversation as resolved.
Show resolved Hide resolved
try
return convert(T, missingval)
catch e
return missingval
end
end
47 changes: 46 additions & 1 deletion test/tables.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
using DimensionalData, IteratorInterfaceExtensions, TableTraits, Tables, Test, DataFrames
using DimensionalData, IteratorInterfaceExtensions, TableTraits, Tables, Test, DataFrames, Random

using DimensionalData.Lookups, DimensionalData.Dimensions
using DimensionalData: DimTable, DimExtensionArray
Expand Down Expand Up @@ -154,3 +154,48 @@ end
@test Tables.columnnames(t3) == (:dimensions, :layer1, :layer2, :layer3)
@test Tables.columnnames(t4) == (:band, :geometry, :value)
end

@testset "Materialize from table" begin
a = DimArray(rand(UInt8, 100, 100), (X(100:-1:1), Y(-250:5:249)))
b = DimArray(rand(Float32, 100, 100), (X(100:-1:1), Y(-250:5:249)))
c = DimArray(rand(Float64, 100, 100), (X(100:-1:1), Y(-250:5:249)))
ds = DimStack((a=a, b=b, c=c))
t = DataFrame(ds)
t1 = Random.shuffle(t)
t2 = t[101:end,:]

# Restore DimArray from shuffled table
@test all(DimArray(t1, dims(ds)) .== a)
@test all(DimArray(t1, dims(ds), name="a") .== a)
@test all(DimArray(t1, dims(ds), name="b") .== b)
@test all(DimArray(t1, dims(ds), name="c") .== c)

# Restore DimArray from table with missing rows
@test all(DimArray(t2, dims(ds), name="a")[Y(2:100)] .== a[Y(2:100)])
@test all(DimArray(t2, dims(ds), name="b")[Y(2:100)] .== b[Y(2:100)])
@test all(DimArray(t2, dims(ds), name="c")[Y(2:100)] .== c[Y(2:100)])
@test DimArray(t2, dims(ds), name="a")[Y(1)] .|> ismissing |> all
@test DimArray(t2, dims(ds), name="b")[Y(1)] .|> ismissing |> all
@test DimArray(t2, dims(ds), name="c")[Y(1)] .|> ismissing |> all
@test DimArray(t2, dims(ds), name="a")[Y(2:100)] .|> ismissing .|> (!) |> all
@test DimArray(t2, dims(ds), name="b")[Y(2:100)] .|> ismissing .|> (!) |> all
@test DimArray(t2, dims(ds), name="c")[Y(2:100)] .|> ismissing .|> (!) |> all

# Restore DimStack from shuffled table
restored_stack = DimStack(t1, dims(ds))
@test all(restored_stack.a .== ds.a)
@test all(restored_stack.b .== ds.b)
@test all(restored_stack.c .== ds.c)

# Restore DimStack from table with missing rows
restored_stack = DimStack(t2, dims(ds))
@test all(restored_stack.a[Y(2:100)] .== ds.a[Y(2:100)])
@test all(restored_stack.b[Y(2:100)] .== ds.b[Y(2:100)])
@test all(restored_stack.c[Y(2:100)] .== ds.c[Y(2:100)])
@test restored_stack.a[Y(1)] .|> ismissing |> all
@test restored_stack.b[Y(1)] .|> ismissing |> all
@test restored_stack.c[Y(1)] .|> ismissing |> all
@test restored_stack.a[Y(2:100)] .|> ismissing .|> (!) |> all
@test restored_stack.b[Y(2:100)] .|> ismissing .|> (!) |> all
@test restored_stack.c[Y(2:100)] .|> ismissing .|> (!) |> all
end
Loading