Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interleaved geometry #78

Open
3 tasks
dcooley opened this issue Apr 16, 2020 · 13 comments
Open
3 tasks

interleaved geometry #78

dcooley opened this issue Apr 16, 2020 · 13 comments

Comments

@dcooley
Copy link
Owner

dcooley commented Apr 16, 2020

coordinates stored in a single vector

[x,y,z,x,y,z,x,y,z,...]

And associated meta-data describing start indices, stride length


References & discussion


TODO

  • interleave geometry coordinates
  • follow design of flat lists / geo-arrow
  • polygons need inner-ring start_indices

Prototype

library(sfheaders)

sf <- mapdeck::roads

it <- sfheaders:::rcpp_interleave_sf( sf )

str( it )

List of 6
 $ coordinates      : num [1:115514] 145 -37.8 145 -37.8 145 ...
 $ start_indices    : int [1:18286] 0 20 46 62 70 74 81 91 115 125 ...
 $ n_coordinates    : int [1:18286] 20 26 16 8 4 7 10 24 10 10 ...
 $ total_coordinates: num 57757
 $ stride           : int 2
 $ data             :'data.frame':	18286 obs. of  15 variables:
  ..$ EZI_RDNAME: chr [1:18286] "MAIN YARRA TRAIL" "YARRA BOULEVARD" "UNNAMED" "YARRA-CITYLINK OUT RAMP ON" ...
  ..$ FQID      : num [1:18286] 5516 1347 5516 3316 3470 ...
  ..$ FROM_UFI  : num [1:18286] 16141399 2319828 16141401 39157640 2319921 ...
  ..$ FTYPE_CODE: chr [1:18286] "road" "road" "road" "road" ...
  ..$ LEFT_LOC  : chr [1:18286] "BURNLEY" "BURNLEY" "BURNLEY" "BURNLEY" ...
  ..$ PFI       : num [1:18286] 11815534 5684911 11749283 5684402 5684635 ...
  ..$ RD_NAME1  : chr [1:18286] "CAPITAL CITY" NA NA NA ...
  ..$ RD_NAME2  : chr [1:18286] NA NA NA NA ...
  ..$ RD_TYPE1  : chr [1:18286] "TRAIL" NA NA NA ...
  ..$ RD_TYPE2  : chr [1:18286] NA NA NA NA ...
  ..$ RIGHT_LOC : chr [1:18286] "BURNLEY" "BURNLEY" "BURNLEY" "BURNLEY" ...
  ..$ ROAD_NAME : chr [1:18286] "MAIN YARRA" "YARRA" "UNNAMED" "YARRA-CITYLINK OUT" ...
  ..$ ROAD_TYPE : chr [1:18286] "TRAIL" "BOULEVARD" NA "RAMP" ...
  ..$ TO_UFI    : num [1:18286] 39357869 39364350 2319308 39157656 45130685 ...
  ..$ UFI       : num [1:18286] 39352041 39364294 16140773 39157330 45113563 ...
dcooley added a commit that referenced this issue Apr 19, 2020
SymbolixAU pushed a commit that referenced this issue Apr 22, 2020
dcooley added a commit that referenced this issue Apr 22, 2020
dcooley added a commit that referenced this issue Apr 22, 2020
dcooley added a commit that referenced this issue Apr 23, 2020
@SymbolixAU
Copy link
Contributor

some benchmarks

library(microbenchmark)

## Need to get this speed down.
microbenchmark(
  df_T = { sfheaders::sf_to_df( mapdeck::roads, fill = T ) },
  df_F = { sfheaders::sf_to_df( mapdeck::roads, fill = F ) },
  it = { sfheaders:::rcpp_interleave_sf( mapdeck::roads ) },
  times = 10
)

# Unit: milliseconds
# expr      min       lq     mean   median       uq      max neval
# df_T 29.93938 30.87460 32.42612 32.43507 34.31048 34.82401    10
# df_F 23.60596 24.32471 28.00786 28.39502 29.48871 34.00938    10
#   it 28.92496 30.99812 34.33903 35.01555 36.09556 40.84551    10

## interleaving benchmarks

library(Rcpp)

cppFunction(
  
  code = '
    Rcpp::NumericVector interleave( Rcpp::NumericMatrix& mat ) {
    
      R_xlen_t n_col = mat.ncol();
      R_xlen_t n_row = mat.nrow();
      Rcpp::NumericVector res( n_col * n_row );
      R_xlen_t i, j;
      R_xlen_t position_counter = 0;
      for( i = 0; i < n_row; ++i ) {
        for( j = 0; j < n_col; ++j ) {
          res[ position_counter ] = mat( i, j );
          position_counter = position_counter + 1;
        }
      }
      return res;
      
    }
  '
)

cppFunction(
  code = '
    Rcpp::NumericVector interleave_t( Rcpp::NumericMatrix& mat ) {
      Rcpp::NumericMatrix m2 = Rcpp::transpose( mat );
      m2.attr("dim") = R_NilValue;
      return m2;
    }
  '
)

cppFunction(
  code = '
    Rcpp::NumericVector interleave_cols( Rcpp::NumericMatrix& mat ) {
      Rcpp::NumericVector col1 = mat( Rcpp::_, 0 );
      Rcpp::NumericVector col2 = mat( Rcpp::_, 1 );
      R_xlen_t n_row = mat.nrow();
      Rcpp::NumericVector res( n_row + n_row );
      R_xlen_t i;
      R_xlen_t counter = 0;
      for( i = 0; i < n_row; ++i ) {
        res[ counter ] = col1[ i ];
        res[ counter + 1 ] = col2[ i ];
        counter = counter + 1;
      }
      return res;
    }
  '
)

## https://github.com/RcppCore/Rcpp/blob/master/inst/include/Rcpp/vector/Matrix.h#L405
cppFunction(
  code = '
    Rcpp::NumericVector interleave_impl( Rcpp::NumericMatrix& mat ) {
      R_xlen_t nrow = mat.nrow(), ncol = mat.ncol();
      R_xlen_t len = nrow * ncol;
      R_xlen_t len2 = len - 1;
      Rcpp::NumericVector res( len );
      R_xlen_t i, j;
      for( i = 0, j = 0; i < len; ++i, j += nrow ) {
        if (j > len2) j -= len2;
        res[i] = mat[j];
      }
      return res;
    }
  '
)


m <- matrix(rnorm(3e1), ncol = 3)
as.numeric( t(m) )
interleave_t( m )
interleave_impl( m )



library(microbenchmark)

m <- matrix(rnorm(3e7), ncol = 3)

microbenchmark(
  loop = { interleave( m ) },
  tran = { interleave_t( m ) },
  cols = { interleave_cols( m ) },
  impl = { interleave_impl( m ) },
  times = 25
)

# Unit: milliseconds
# expr      min       lq     mean   median       uq       max neval
# loop 55.18919 64.50406 70.85069 67.40234 71.77256 130.16606    25
# tran 51.28294 52.17571 60.82680 62.26534 68.09855  69.24250    25
# cols 60.38587 61.41036 64.24787 62.21885 67.20412  70.19668    25
# impl 48.87238 50.17013 58.46313 59.21523 65.89245  68.35909    25

@kylebarron
Copy link

It's been years since I've used R. If you have suggestions/comments about creating sf data structures from different types of Arrow structures, we'd love to hear.

@dcooley
Copy link
Owner Author

dcooley commented Apr 29, 2020

Cheers @kylebarron. From reading all the discussions I think the conversion from Arrow to sf will be pretty straight forward, so I'm trying to align these "interleaving" functions with what is being discussed.


And for what it's worth, this comment

One of the core concepts of Arrow is that it is a columnar layout, so which means all values in a single column (here: all geometries of one column) are stored in one contiguous array.

Is what I was thinking about when I asked this question

is your proposed size a single value, or an array? That is, is it possible (or does it make sense?) to encode two linestrings with different dimensions

(And like you I'm coming at this from a Deck.gl performance perspective)

@kylebarron
Copy link

Yes, I think I was a little confused. I didn't understand that each key of the struct was stored as a separate contiguous array. So the abstraction is that each struct is an independent record, but internally each value of the struct is an Array (possibly an Array of Arrays of Arrays ...). So the size is essentially an array internally that varies across rows, but not within a single record.

Anyways, I think there's still some good discussion to be had, so I wouldn't spend too much effort implementing any one proposal yet.

@dcooley
Copy link
Owner Author

dcooley commented Apr 30, 2020

Anyways, I think there's still some good discussion to be had, so I wouldn't spend too much effort implementing any one proposal yet.

yeah absolutely. I'm not "productionising" anything yet, just setting up some processes ready to hit whatever structure is needed.

@mdsumner
Copy link
Contributor

How could you encode structure in this interleave form? (I guess it's too early to say for sfheaders yeah)

There's no mapping between $data and $start_indices and I'm drawn to this comment: geoarrow/geoarrow#4 the 'into parts / into rings / into values' stuff

The discussion there leaves me wondering what to do about triangles, you really don't want expanded interleaved versions of those, and it seems missing when you can have a packed 'values' array and ' indexing values' (it's missing from the SF standard like a crime from long dead ancestors) so I wonder if you've encountered the topic elsewhere?

@dcooley
Copy link
Owner Author

dcooley commented Jul 15, 2020

How could you encode structure in this interleave form

what do you mean by "structure" exactly?

There's no mapping between $data and $start_indices

The index of the [start_indices] array is the row-index of [data]

@mdsumner
Copy link
Contributor

oh, sorry - quite right - I mean for multi-part types, is it just because it's LINESTRING above? - there's only run-length no parts- hierarchy (that's all I mean by structure)

@dcooley
Copy link
Owner Author

dcooley commented Jul 15, 2020

oh I see. Yeah I've not fully thought about it yet. Was going to come back to this once geometries and sfheaders are updated. There's a solution out there somewhere...

@mdsumner
Copy link
Contributor

Cool, it's interesting - so refreshing to see such discussion, and I was pretty dismissive at first, I was exploring these things long ago and it seems the important stuff is not where I expected

@mdsumner
Copy link
Contributor

The crux is in how we store indexed coordinates! I'm sure of it, we have to have triangles and segments in dense form and this seems so close to that now

@dcooley
Copy link
Owner Author

dcooley commented Jul 15, 2020

we have to have triangles

I should mention, my v0.0.1 plan of {interleave} is to create (only) POINTs, LINEs and Triangles (via ear-cutting) from any geometry (hence {geometries}), as these will feed directly into {mapdeck}.

@mdsumner
Copy link
Contributor

Oh and of course, that means run length is enough - no need for part/hole nonsense when you only have foreground pieces. That's worth dwelling on ... seems like good convergence!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants