Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending Base.stack for DimArrays #645

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

brendanjohnharris
Copy link

@brendanjohnharris brendanjohnharris commented Feb 23, 2024

Adds methods for Base.stack, and related non-exported functions from Base, that are compatible with DimArrays.

Syntax follows Base: stacking DimArrays along a given axis dims creates a new dimension. However, existing dimension data is preserved, and the new dimension becomes an AnonDim.

Optionally, a Dimension dim can be provided as the first argument to stack, in which case the new dimension is assigned as dim rather than AnonDim.

@codecov-commenter
Copy link

codecov-commenter commented Feb 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.99%. Comparing base (9846bfe) to head (884ab68).
Report is 41 commits behind head on main.

Current head 884ab68 differs from pull request most recent head 2c7b8b9

Please upload reports for the commit 2c7b8b9 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #645      +/-   ##
==========================================
+ Coverage   83.83%   83.99%   +0.15%     
==========================================
  Files          45       45              
  Lines        4102     4136      +34     
==========================================
+ Hits         3439     3474      +35     
+ Misses        663      662       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -131,15 +131,15 @@ end
"""
function Base.eachslice(A::AbstractDimArray; dims)
dimtuple = _astuple(dims)
if !(dimtuple == ())
if !(dimtuple == ())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to remove all these whitespace changes so we can see the real changes

end

newdims = first(origdims)
newdims = ntuple(d -> d == newdim ? AnonDim() : newdims[d-(d>newdim)], length(newdims) + 1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make this a do block so its easier to read


To fix for `AbstractDimArray`, pass new lookup values as `cat(As...; dims=$D(newlookupvals))` keyword or `dims=$D()` for empty `NoLookup`.
"""

function Base._typed_stack(::Colon, ::Type{T}, ::Type{S}, A, Aax=_iterator_axes(A)) where {T,S<:AbstractDimArray}
origdims = dims.(A)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
origdims = dims.(A)
origdims = map(dims, A)

DimArray(_A, newdims)
end

function Base.stack(dim::Dimension, A::AbstractVector{<:AbstractDimArray}; dims=nothing, kwargs...)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the idea with dim as the first argument?

The tests also don't hit this code.

@brendanjohnharris
Copy link
Author

Thanks for the suggestions! Went through and cleaned up the PR.

The Base.stack(dim, X; dims) method lets you supply a Dimension to assign to the newly created array dimension; not strictly necessary, but more convenient than setting or rebuilding the resulting array to overwrite the new AnonDim (is an AnonDim a suitable way to treat a newly created dimension, as I've done here?).

Let me know if you have more thoughts on this, happy to work on it further :)

@brendanjohnharris brendanjohnharris marked this pull request as ready for review February 27, 2024 05:01
true
```
"""
function Base.stack(dim::Dimension, A::AbstractVector{<:AbstractDimArray}; dims=nothing, kwargs...)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not pass dim in dims?

This method signature is a bit strange , we usually try not to add varients on Base methods, we just allow dims to specify named dimensions.

@rafaqz
Copy link
Owner

rafaqz commented Feb 27, 2024

We try not to change base method signatures besides allowing dims to be named. To keep the mental model very simple.

I would just put that new Dimension in dims like cat does. If its a constructed dimension it replaces the existing one. Otherwise Type/Symbol etc just choose them. Just check d isa Dimension

You will also need to format(newdims, A) afterwards.

@brendanjohnharris
Copy link
Author

Base.stack behaves differently than Base.cat in that it always creates a new dimension (i.e. Base.stack([A, B,...]; dims=2) inserts a dimension at position 2, increasing ndims(A) by 1); in general this new dimension is unrelated to the existing array dimensions, so we need to specify both position (dims) and a Dimension (dim).

I see your point about keeping new signatures to a minimum, though. How about allowing the keyword argument dims to be assigned as an 'Integer=>Dimension' pair, with the following behavior:

  1. If dims isa Integer: The new dimension is an AnonDim at position dims
  2. If dims isa Dimension: The new dimension is a DImension dims at position ndims(A)+1
  3. If dims isa Pair{Integer, Dimension}: The new dimension is a Dimension last(dims) at position first(dims)

We could also, instead, have that the new dimension is always an AnonDim, leaving it up to the user to set that dimension at a later time; in that case, however, there is ambiguity if there are any existing AnonDims in the input arrays.

@rafaqz
Copy link
Owner

rafaqz commented Feb 28, 2024

Using a Pair sounds like a good solution (as do points 1 and 2 as simpler cases). For 2 I guess putting it as the last dimension would be the majority use-case? (I rarely use stack)

Always AnonDim will require using set all the time with AnonDim => somedim. We may as well just put that pair in stack and skip the step.

src/array/methods.jl Outdated Show resolved Hide resolved
end
newdims = format(newdims, B)

B = rebuild(B; dims=newdims)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
B = rebuild(B; dims=newdims)
B = rebuild(B; dims=format(newdims, B))

User specified dims are usually incomplete and possibly incorrect

@brendanjohnharris brendanjohnharris force-pushed the stack branch 2 times, most recently from a93d665 to 7a4cf26 Compare March 7, 2024 05:01
@@ -547,6 +547,100 @@ $message on dimension $D.
To fix for `AbstractDimArray`, pass new lookup values as `cat(As...; dims=$D(newlookupvals))` keyword or `dims=$D()` for empty `NoLookup`.
"""

function Base._typed_stack(::Colon, ::Type{T}, ::Type{S}, A, Aax=_iterator_axes(A)) where {T,S<:AbstractDimArray}
Copy link
Owner

@rafaqz rafaqz Mar 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How stable do you think these methods are... could we add a method to Base.stack instead? What do we gain from touching these internals?

I know we use internals elsewhere, but we should stop:
#522

end

newdims = first(origdims)
newdims = ntuple(length(newdims) + 1) do d
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this type-stable?

src/array/methods.jl Outdated Show resolved Hide resolved
src/array/methods.jl Outdated Show resolved Hide resolved
@rafaqz
Copy link
Owner

rafaqz commented Nov 12, 2024

@brendanjohnharris any updates? I can also finish this if you don't have time

@brendanjohnharris
Copy link
Author

brendanjohnharris commented Nov 15, 2024

Sorry fell off my radar, my bad
How about something much simpler? Base.stack now seems to work as expected as long as the input collection is a DimArray, except that if the elements have different dims the resulting stacked array uses the dims of the first element.
This latest commit just adds a comparedims check to Base.stack() and, if this check fails, returns the stacked parent array (like cat).
Then it's not much work at all to construct a DimVector{<:DimArray} with the new dimensions as needed:

using DimensionalData
a = [1 2 3; 4 5 6]
da = DimArray(a, (X(4.0:5.0), Y(6.0:8.0)))
b = [7 8 9; 10 11 12]
db = DimArray(b, (X(4.0:5.0), Y(6.0:8.0)))
x = DimArray([da, db], (Z(4.0:5.0)))
stack(x; dims=2) # Has dims X, *Z*, Y

@rafaqz
Copy link
Owner

rafaqz commented Nov 16, 2024

Perfect, simpler is better!

But we could also allow Dimension dims using dimnum?

comparedims(Bool, dims.(iter)...; order=true, val=true, msg=Dimensions.Warn(" Can't `stack` AbstractDimArray, applying to `parent` object."))
iter
end
Base.stack(iter::AbstractArray{<:AbstractDimArray}; dims=:) = Base._stack(dims, check_stack_dims(iter))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Base.stack(iter::AbstractArray{<:AbstractDimArray}; dims=:) = Base._stack(dims, check_stack_dims(iter))
Base.stack(iter::AbstractArray{<:AbstractDimArray}; dims=:) = Base._stack(dimnum(first(iter), dims), check_stack_dims(iter))

This should allow passing Dimension/Symbol etc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

Base._stack(dims::Dimension, iter) = Base._stack(typeof(dims), iter)
Base._stack(dims::Type{<:Dimension}, iter) = Base._stack(dimnum(first(iter), dims), iter)

This helps because the default dims in Base's Base.stack is (oddly) dims=:. This has the same effect as dims=ndims(first(iter))+1, as far as I can tell, but it doesn't play well with dimnums in this context.

Copy link
Owner

@rafaqz rafaqz Nov 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to avoid dispatch on the underscore as much as possible... And we also need to accept Symbol. I would just define another method like _maybe_dimnum that has dispatch for Colon and Int that don't call dimnum and everything else uses dimnum

src/array/methods.jl Outdated Show resolved Hide resolved
x = DimArray([da, ca], (Dim{:a}(1:2),))
sx = stack(x; dims=1)
sy = @test_nowarn stack(x; dims=:)
sz = @test_nowarn stack(x; dims=X)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with dims=Z ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, dims=Z (where Z is not a dimension in the DimArrays being stacked) falls back to adding the new dimension at the end (so the result is the same as for the default behavior):

_maybe_dimnum(x, dim) = hasdim(x, dim) ? dimnum(x, dim) : ndims(x) + 1

Should we throw an error instead, like when an out-of-range Integer dim is give? I.e.:

function _maybe_dimnum(x, dim::Int)
    if dim < ndims(x) + 2
        return dim
    else
        throw(ArgumentError(LazyString("cannot stack slices ndims(x) = ", ndims(x) + 1, " along dims = ", dim)))
    end
end

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we should add a dimension like it is currently, but we can set it to be a Z dimension

Copy link
Author

@brendanjohnharris brendanjohnharris Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the current minimal approach, stacking with a new dimension Z would be:

a = [1 2 3; 4 5 6]
da = DimArray(a, (X(4.0:5.0), Y(6.0:8.0)))
b = [7 8 9; 10 11 12]
db = DimArray(b, (X(4.0:5.0), Y(6.0:8.0)))
x = DimArray([da, db], (Dim{:a}(1:2),))

stack(set(x, :a=>Z)) # Or
set(stack(x), :a=>Z})

We could have this behavior be automatic (without overloading Base._stack) with:

function Base.stack(iter::AbstractArray{<:AbstractDimArray}; dims=:)
    x = Base._stack(_maybe_dimnum(first(iter), dims), check_stack_dims(iter))
    if !hasdim(x, dims) && Z isa Union{Dimension,Type{<:Dimension}}
        x = set(x, DimensionalData.dims(x)[end] => dims)
    end
    return x
end

But I wonder if it could be confusing to have this different behavior for when dims <: Dimension and hasdim(x, dims) (the new dimension is inserted before dims but keeps the original name) versus !hasdims(x, dims) (the new dimension is inserted at the end and renamed to dims); it also means the return type can't be inferred

Copy link
Owner

@rafaqz rafaqz Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, dims=Z (where Z is not a dimension in the DimArrays being stacked) falls back to adding the new dimension at the end (so the result is the same as for the default behavior)

I'm not sure we are understanding each other... My only ask here was that this new dimension is a Z dimension if dims=Z was used

src/array/methods.jl Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants