Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul DimTable #536

Merged
merged 25 commits into from
Sep 19, 2023
Merged

Overhaul DimTable #536

merged 25 commits into from
Sep 19, 2023

Conversation

JoshuaBillson
Copy link
Contributor

The current implementation of DimTable has a number of drawbacks. In particular, we need to support the following:

  1. Can handle data with many columns.
  2. Can unfold specified dimensions of an AbstractDimArray into columns.
  3. Can construct an AbstractDimStack from a table.
  4. Compatibility with GeoStats.jl.

To solve this, I've implemented the WideDimTable type. I've also implemented a new column, MergedDimColumn, which wraps two or more DimColumns to produce a tuple of points under the :geometry header. To combine dimensions, we can pass mergedims=true. We can also treat a single dimension of an AbstractDimArray as though it was layers of an AbstractDimStack by passing layersfrom=Dimension. Here's an example of the interface thus far:

julia> dimarray = DimArray(rand(Float32, 512, 512, 5), (X, Y, Dim{:band}));

julia> WideDimTable(dimarray) |> DataFrame
1310720×4 DataFrame
     Row │ X      Y      band   value      
         │ Int64  Int64  Int64  Float32    
─────────┼─────────────────────────────────
       11      1      1  0.366132
       22      1      1  0.00185609
       33      1      1  0.718482
       44      1      1  0.13569
                        
 1310717509    512      5  0.78933
 1310718510    512      5  0.929937
 1310719511    512      5  0.455286
 1310720512    512      5  0.252793
                       1310712 rows omitted

julia> WideDimTable(dimarray, layersfrom=Dim{:band}) |> DataFrame
262144×7 DataFrame
    Row │ X      Y      band_1      band_2     band_3      band_4     band_5    
        │ Int64  Int64  Float32     Float32    Float32     Float32    Float32   
────────┼───────────────────────────────────────────────────────────────────────
      11      1  0.366132    0.466749   0.340366    0.459397   0.17998
      22      1  0.00185609  0.219978   0.00102282  0.729957   0.295339
      33      1  0.718482    0.0594266  0.80399     0.604377   0.076027
      44      1  0.13569     0.326482   0.464005    0.406371   0.227705
                                                           
 262141509    512  0.922686    0.0885217  0.616942    0.292217   0.78933
 262142510    512  0.960718    0.931053   0.269816    0.220552   0.929937
 262143511    512  0.843416    0.339335   0.0562063   0.93825    0.455286
 262144512    512  0.165067    0.60916    0.878569    0.731508   0.252793
                                                             262136 rows omitted

julia> WideDimTable(dimarray, layersfrom=Dim{:band}, mergedims=true) |> DataFrame
262144×6 DataFrame
    Row │ geometry    band_1      band_2     band_3      band_4     band_5    
        │ Tuple      Float32     Float32    Float32     Float32    Float32   
────────┼─────────────────────────────────────────────────────────────────────
      1 │ (1, 1)      0.366132    0.466749   0.340366    0.459397   0.17998
      2 │ (2, 1)      0.00185609  0.219978   0.00102282  0.729957   0.295339
      3 │ (3, 1)      0.718482    0.0594266  0.80399     0.604377   0.076027
      4 │ (4, 1)      0.13569     0.326482   0.464005    0.406371   0.227705
                                                        
 262141 │ (509, 512)  0.922686    0.0885217  0.616942    0.292217   0.78933
 262142 │ (510, 512)  0.960718    0.931053   0.269816    0.220552   0.929937
 262143 │ (511, 512)  0.843416    0.339335   0.0562063   0.93825    0.455286
 262144 │ (512, 512)  0.165067    0.60916    0.878569    0.731508   0.252793
                                                           262136 rows omitted

@rafaqz
Copy link
Owner

rafaqz commented Sep 9, 2023

This looks great. I'm wondering about the relationship of DimTable and WideDimeTable. Its good that we keep both for backwards compat and type stability reasons.

But maybe we should as much as possible keep the interface the same, and add layersfrom=nothing, mergedims=false keywords to DimTable as well?

Copy link
Owner

@rafaqz rafaqz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up. Main comments are it would be good to reduce code duplication and unnecessary differences with DimTable - it would be clean if they only differ on the type stability of columns.

Other than that a bunch of minor improvements and questions.

src/tables.jl Show resolved Hide resolved
src/tables.jl Outdated Show resolved Hide resolved
src/tables.jl Show resolved Hide resolved
src/tables.jl Outdated Show resolved Hide resolved
src/tables.jl Outdated Show resolved Hide resolved
src/tables.jl Outdated Show resolved Hide resolved
src/tables.jl Outdated Show resolved Hide resolved
@JoshuaBillson
Copy link
Contributor Author

JoshuaBillson commented Sep 9, 2023

This looks great. I'm wondering about the relationship of DimTable and WideDimeTable. Its good that we keep both for backwards compat and type stability reasons.

But maybe we should as much as possible keep the interface the same, and add layersfrom=nothing, mergedims=false keywords to DimTable as well?

I mostly kept DimTable to benchmark it against WideDimTable. If we kept both, under what conditions would we use one over the other? Are we planning on exposing an interface for users to explicitly choose?

@rafaqz
Copy link
Owner

rafaqz commented Sep 9, 2023

Im happy to merge them, but we need to keep something called DimTable.

Im also not used to this PR style of adding new code then switching it out layer. The downside is not seeing the specific changes being made line by line.

If you want to remove tge current DimTable this code needs to diff it pretty cleanly.

@JoshuaBillson
Copy link
Contributor Author

I've implemented an unmergedims method for undoing the effects of mergedims on both AbstractDimArray and AbstractDimStack. This should get us a good way towards restoring a stack or array from a table.

julia> dimarray = DimArray(rand(Float32, 512, 512, 5), (X, Y, Dim{:band}));

julia> mergedarray = mergedims(dimarray, (X,Y)=>:geometry)
5×262144 DimArray{Float32,2} with dimensions: 
  Dim{:band},
  Dim{:geometry} MergedLookup{Tuple{Int64, Int64}} Tuple{Int64, Int64}[(1, 1), (2, 1), , (511, 512), (512, 512)] X, Y
  (1, 1)    (2, 1)    (3, 1)      (4, 1)    (5, 1)    (6, 1)    (7, 1)     (8, 1)       (508, 512)   (509, 512)   (510, 512)   (511, 512)   (512, 512)
 0.408622  0.886254  0.0976083   0.923665  0.425342  0.776444  0.171379   0.0781628     0.892641     0.0971096    0.276245     0.858681     0.405161
 0.899253  0.418132  0.0659314   0.339008  0.980397  0.496013  0.0799317  0.883041      0.873571     0.70769      0.808338     0.316129     0.994636
 0.469823  0.410051  0.00577027  0.512094  0.368832  0.216379  0.978313   0.763036      0.715357     0.340662     0.561662     0.166867     0.482182
 0.909567  0.287847  0.373506    0.121887  0.103681  0.52259   0.726426   0.262001      0.537583     0.392537     0.907184     0.57249      0.0316634
 0.141684  0.509818  0.94262     0.606817  0.172232  0.548815  0.32792    0.898369     0.531061     0.248262     0.758778     0.493107     0.10552

julia> unmerged = unmergedims(mergedarray, dims(dimarray))
512×512×5 DimArray{Float32,3} with dimensions: X, Y, Dim{:band}
[:, :, 1]
 0.408622   0.1928     0.363784   0.20637    0.0623429  0.540009  0.870785   0.6065        0.380542   0.708444   0.621855   0.654291   0.307949    0.583784
 0.886254   0.222856   0.810224   0.247393   0.164376   0.529767  0.0276215  0.449118       0.309448   0.0370631  0.0928274  0.727497   0.95558     0.579522
 0.0976083  0.47824    0.0147222  0.642226   0.9596     0.127347  0.137955   0.247403       0.678952   0.43045    0.582117   0.532908   0.219411    0.786123
 0.923665   0.393615   0.434965   0.946112   0.894384   0.950089  0.50744    0.408371       0.0949435  0.265479   0.585809   0.624134   0.755077    0.373703
 0.425342   0.220184   0.862294   0.944012   0.75059    0.755926  0.486645   0.838939       0.802087   0.712398   0.581404   0.476088   0.514338    0.513238
 0.776444   0.27527    0.940428   0.425444   0.5388     0.226998  0.197969   0.376402      0.948172   0.157085   0.702399   0.0637432  0.579163    0.123023
 0.171379   0.643685   0.814836   0.162074   0.205144   0.631337  0.651405   0.860217       0.583075   0.807928   0.618025   0.40905    0.0272511   0.418449
 0.0781628  0.318983   0.85593    0.493423   0.456952   0.163325  0.214463   0.892753       0.0738822  0.376432   0.018835   0.664184   0.16614     0.0338068
 0.895283   0.428974   0.66919    0.743919   0.402517   0.699578  0.166144   0.504705       0.471096   0.526914   0.256205   0.669583   0.979642    0.713561
 0.607489   0.676163   0.337688   0.595755   0.261997   0.370041  0.994099   0.87605        0.61286    0.617319   0.0632272  0.753327   0.00494283  0.0277535
                                                                                                                                                
 0.505413   0.932647   0.466523   0.176984   0.519687   0.879219  0.0315067  0.453979      0.357632   0.643493   0.867573   0.369308   0.908243    0.41651
 0.433541   0.176856   0.982449   0.620943   0.874699   0.85108   0.134078   0.00542814     0.215931   0.588794   0.308641   0.174514   0.492008    0.27178
 0.366505   0.224207   0.907218   0.903873   0.491976   0.980417  0.463662   0.931333       0.138668   0.0722253  0.0868067  0.826775   0.812139    0.441321
 0.235435   0.286471   0.402834   0.830526   0.594737   0.383988  0.312103   0.348005       0.10901    0.940322   0.692892   0.81595    0.967958    0.677501
 0.850423   0.586668   0.253597   0.274876   0.997255   0.576228  0.0167348  0.0360385      0.587654   0.626096   0.952066   0.645478   0.482546    0.379
 0.477778   0.698586   0.237578   0.646602   0.717585   0.644979  0.729319   0.622078      0.684544   0.530124   0.92819    0.947521   0.971794    0.892641
 0.52555    0.299894   0.0815916  0.0887221  0.982812   0.896342  0.659178   0.416482       0.701961   0.518297   0.864392   0.403778   0.528058    0.0971096
 0.135391   0.0488768  0.390891   0.467249   0.559043   0.609367  0.404012   0.843719       0.602538   0.0817617  0.0658015  0.943544   0.0492677   0.276245
 0.502091   0.0233992  0.843368   0.827739   0.210922   0.142626  0.693656   0.524975       0.177573   0.25404    0.256602   0.493337   0.823688    0.858681
 0.33225    0.0327858  0.743641   0.475116   0.412457   0.970831  0.504166   0.0251963      0.462065   0.877883   0.299307   0.44449    0.986187    0.405161
[and 4 more slices...]

julia> all(dimarray .== unmerged)
true

@rafaqz
Copy link
Owner

rafaqz commented Sep 11, 2023

We might want to rebase this against #477 so the mergedims code is attributed to @sethaxen and the other changes are included

@codecov-commenter
Copy link

codecov-commenter commented Sep 11, 2023

Codecov Report

Merging #536 (59f10a7) into main (4ce2294) will decrease coverage by 6.52%.
Report is 81 commits behind head on main.
The diff coverage is 86.91%.

@@            Coverage Diff             @@
##             main     #536      +/-   ##
==========================================
- Coverage   89.64%   83.13%   -6.52%     
==========================================
  Files          39       40       +1     
  Lines        2743     3297     +554     
==========================================
+ Hits         2459     2741     +282     
- Misses        284      556     +272     
Files Changed Coverage Δ
src/DimensionalData.jl 100.00% <ø> (ø)
src/Dimensions/Dimensions.jl 100.00% <ø> (ø)
src/LookupArrays/LookupArrays.jl 100.00% <ø> (ø)
src/interface.jl 100.00% <ø> (ø)
src/plotrecipes.jl 82.57% <0.00%> (-1.93%) ⬇️
src/set.jl 94.44% <ø> (ø)
src/stack/methods.jl 89.47% <0.00%> (-6.69%) ⬇️
src/array/show.jl 90.74% <25.00%> (-4.36%) ⬇️
src/LookupArrays/indexing.jl 75.00% <50.00%> (+15.00%) ⬆️
src/stack/show.jl 97.50% <66.66%> (-2.50%) ⬇️
... and 14 more

... and 6 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@rafaqz rafaqz mentioned this pull request Sep 12, 2023
@rafaqz
Copy link
Owner

rafaqz commented Sep 14, 2023

Is this one ready to merge? Would be good to get it in soon for the breaking release (I'm assuming it is slightly breaking for Dimtable? If not take your time.)

@JoshuaBillson
Copy link
Contributor Author

Is this one ready to merge? Would be good to get it in soon for the breaking release (I'm assuming it is slightly breaking for Dimtable? If not take your time.)

I've finished adding new test cases and updated the docs. We're passing all the original test cases, so I don't think there should be any breaking changes. Nonetheless, it would be best to do a breaking release just to be safe.

I still need to implement a method for restoring an AbstractDimArray or AbstractDimStack from a table, but we can probably make that a separate PR.

src/array/array.jl Outdated Show resolved Hide resolved
src/array/array.jl Outdated Show resolved Hide resolved
src/array/array.jl Show resolved Hide resolved
src/array/array.jl Show resolved Hide resolved
@rafaqz
Copy link
Owner

rafaqz commented Sep 19, 2023

Ok to merge?

@JoshuaBillson
Copy link
Contributor Author

Ok to merge?

Yes, I think we can merge.

@rafaqz
Copy link
Owner

rafaqz commented Sep 19, 2023

Thanks. Great PR too, very much appreciated.

@rafaqz rafaqz merged commit abfbf2d into rafaqz:main Sep 19, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants