Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer overflow on mean #861

Open
DanDeepPhase opened this issue Nov 15, 2024 · 3 comments
Open

Integer overflow on mean #861

DanDeepPhase opened this issue Nov 15, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@DanDeepPhase
Copy link

Taking the elementwise mean of a vector of Integer DimArrays overflows integer bounds

Base:

julia> M= fill(UInt16(32000), 2)
2-element Vector{UInt16}:
 0x7d00
 0x7d00

julia> mean([M, M])               # sum < maxval
2-element Vector{Float64}:
 32000.0
 32000.0

julia> mean([M, M, M])          # sum > maxval
2-element Vector{Float64}:
 32000.0
 32000.0

Whether the sum is greater or less than the integer limits, the answer is correct

Equivalent math with DimArrays:

julia> D = DimArray(M, X(1:2))
╭──────────────────────────────╮
│ 2-element DimArray{UInt16,1} │
├──────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────── dims ┐
   X Sampled{Int64} 1:2 ForwardOrdered Regular Points
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
 1  0x7d00
 2  0x7d00

julia> mean([D, D])          # sum < maxval
╭───────────────────────────────╮
│ 2-element DimArray{Float64,1} │
├───────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────── dims ┐
   X Sampled{Int64} 1:2 ForwardOrdered Regular Points
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
 1  32000.0
 2  32000.0

julia> mean([D, D, D])          # sum > maxval
╭───────────────────────────────╮
│ 2-element DimArray{Float64,1} │
├───────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────── dims ┐
   X Sampled{Int64} 1:2 ForwardOrdered Regular Points
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
 1  10154.7
 2  10154.7

A different answer for mean of two 32000s and three 32000s where the maxval on the int is 65535.

Guessing at this, but is the order of operations:
Base: convert to float |> sum |> divide by N
DimensionalData: sum |> convert to float |> divide by N

taking the mean of a DimArray by itself does not have any overflow issues.

julia> D3 = DimArray(fill(UInt16(32000),3), X(1:3))
╭──────────────────────────────╮
│ 3-element DimArray{UInt16,1} │
├──────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────── dims ┐
   X Sampled{Int64} 1:3 ForwardOrdered Regular Points
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
 1  0x7d00
 2  0x7d00
 3  0x7d00

julia> mean(D3)
32000.0
@rafaqz
Copy link
Owner

rafaqz commented Nov 16, 2024

You order of ops does look like what is happening. But we don't actually do any of the math here.

So this is the difference between a Base specialisation on Vector and the fallback for AbstractArray ? Or something like that.

May be worth a base Julia issue. The sum reduction needs to have init=fill!(similar(A, Float64), 0.0) to force the conversion to Float64. Maybe they don't because that will fail on some array types like StaticArrays.jl.

@DanDeepPhase
Copy link
Author

I followed it into Statistics https://github.com/JuliaStats/Statistics.jl/blob/d49c2bf4f81e1efb4980a35fe39c815ef8396297/src/Statistics.jl#L179-L198

So I think the issue comes out of a difference the results of the type conversion. I've removed the identity, sum and convert:

julia> M= fill(UInt16(32000), 2);
julia> D = DimArray(M, X(1:2));

julia> promote_type(typeof(M/1), typeof(M))
Vector{Float64} (alias for Array{Float64, 1})

julia> promote_type(typeof(D/1), typeof(D))
DimVector{T, Tuple{X{Sampled{Int64, UnitRange{Int64}, ForwardOrdered, Regular{Int64}, Points, NoMetadata}}}, Tuple{}, A, Na, NoMetadata} where {T, A<:AbstractVector{T}, Na} (alias for DimArray{T, 1, Tuple{X{DimensionalData.Dimensions.Lookups.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Regular{Int64}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.NoMetadata}}}, Tuple{}, A, Na, 
DimensionalData.Dimensions.Lookups.NoMetadata} where {T, A<:AbstractArray{T, 1}, Na})

# Note:
# M   isa Array{UInt16,1}
# M/1 isa Array{Float64,1}
# D   isa DimArray{UInt16,1}
# D/1 isa DimArray{Float64,1}

I interpret this as

promote_type(Vector{Float64}, Vector{Int64}) => Vector{Float64}
promote_type(DimVector{Float64}, DimVector{Int64}) => DimVector{Any}

instead of DimVector{Float64}, so maybe the following convert doesn't do anything?

@rafaqz
Copy link
Owner

rafaqz commented Nov 18, 2024

Ok interesting, yes Any won't help. Maybe we are missing a promote_type definition.

We could add one that just forwards all the type parameters to promote_type ?

@rafaqz rafaqz added the bug Something isn't working label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants