-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hypertidy support #4
Comments
The
library(silicate)
#>
#> Attaching package: 'silicate'
#> The following object is masked from 'package:stats':
#>
#> filter
x <- minimal_mesh
## what PATH0 does
p <- PATH0(x)
## every path_ element is the index-trace through p$vertex, plus records which path we are
lapply(p$object$path_, function(x) x[c("vertex_", "path_")])
#> $`1`
#> # A tibble: 14 x 2
#> vertex_ path_
#> <int> <int>
#> 1 1 1
#> 2 2 1
#> 3 10 1
#> 4 12 1
#> 5 8 1
#> 6 11 1
#> 7 9 1
#> 8 1 1
#> 9 3 2
#> 10 6 2
#> 11 7 2
#> 12 5 2
#> 13 4 2
#> 14 3 2
#>
#> $`2`
#> # A tibble: 5 x 2
#> vertex_ path_
#> <int> <int>
#> 1 9 3
#> 2 11 3
#> 3 13 3
#> 4 14 3
#> 5 9 3
## what SC0 does
sc <- SC0(x)
## every topology_ element is the segment-start-end-trace through sc$vertex, plus what path we are
## .vx0, .vx1 are segment end pairs
lapply(sc$object$topology_, function(x) x)
#> $`1`
#> # A tibble: 12 x 3
#> .vx0 .vx1 path_
#> <int> <int> <int>
#> 1 1 2 1
#> 2 2 10 1
#> 3 10 12 1
#> 4 12 8 1
#> 5 8 11 1
#> 6 11 9 1
#> 7 9 1 1
#> 8 3 6 2
#> 9 6 7 2
#> 10 7 5 2
#> 11 5 4 2
#> 12 4 3 2
#>
#> $`2`
#> # A tibble: 4 x 3
#> .vx0 .vx1 path_
#> <int> <int> <int>
#> 1 9 11 3
#> 2 11 13 3
#> 3 13 14 3
#> 4 14 9 3
## what TRI0 does
tri <- TRI0(x)
## every topology_ element is the triangle-corner-trace through sc$vertex, plus what path we are
## .vx0, .vx1, .vx2 are triangle corner triplets
lapply(tri$object$topology_, function(x) x)
#> $`1`
#> # A tibble: 12 x 4
#> .vx0 .vx1 .vx2 path_
#> <int> <int> <int> <int>
#> 1 1 3 4 1
#> 2 6 3 1 1
#> 3 9 11 8 1
#> 4 8 12 10 1
#> 5 2 1 4 1
#> 6 6 1 9 1
#> 7 8 10 2 1
#> 8 2 4 5 1
#> 9 7 6 9 1
#> 10 8 2 5 1
#> 11 7 9 8 1
#> 12 8 5 7 1
#>
#> $`2`
#> # A tibble: 2 x 4
#> .vx0 .vx1 .vx2 path_
#> <int> <int> <int> <int>
#> 1 11 9 14 3
#> 2 14 13 11 3 Created on 2020-04-29 by the reprex package (v0.3.0) There are only 14 vertices, but they occur a total of 19 times (3 to close each path, 2 are on a shared segment). So, one step back from polygon(do.call(rbind, sc$object$topology_), c(".vx0", ".vx1"), "path_")
[[1]]
[,1] [,2]
[1,] 1 2
[2,] 2 10
[3,] 10 12
[4,] 12 8
[5,] 8 11
[6,] 11 9
[7,] 9 1
[[2]]
[,1] [,2]
[1,] 3 6
[2,] 6 7
[3,] 7 5
[4,] 5 4
[5,] 4 3
[[3]]
[,1] [,2]
[1,] 9 11
[2,] 11 13
[3,] 13 14
[4,] 14 9 So, I'm wondering if I should start to rejig things towards using matrices instead - but by far the messiest stuff in silicate is the de-duplication, which is a step prior to the split() part so I'm looking into that. |
(oop also clearly there's a bug in TRI0() not recording the path properly, but fwiw it's not used anywhere yet) |
And I went and tried the listList approach with great success, better sense than above with cppFunction(
depends = "geometries"
, includes = '#include "geometries/shapes/shapes.hpp"'
, code = '
SEXP mpolygon( SEXP df, SEXP geometry_cols, SEXP line_id , SEXP poly_id) {
return geometries::shapes::get_listListMat( df, geometry_cols, line_id, poly_id );
}
'
)
df <- dplyr::bind_rows(sc$object$topology_, .id = "object_")
mpolygon(df, c(".vx0", ".vx1"), "object_", "path_")
[[1]]
[[1]][[1]]
[,1] [,2]
[1,] 1 2
[2,] 2 10
[3,] 10 12
[4,] 12 8
[5,] 8 11
[6,] 11 9
[7,] 9 1
[[1]][[2]]
[,1] [,2]
[1,] 3 6
[2,] 6 7
[3,] 7 5
[4,] 5 4
[5,] 4 3
[[2]]
[[2]][[1]]
[,1] [,2]
[1,] 9 11
[2,] 11 13
[3,] 13 14
[4,] 14 9 |
So true. I think a "unique_coordinates()" would be a really useful feature. But I haven't found an easy way to impement this yet. |
I got a bit interested in rgl, in What I've used in the past is
I've used unjoin mostly because it's quite general and very easy to work with, but And with all that, I was a bit scared of the whole thing but it's time to revisit ;) |
Here is a re-imagined (De-duplication was always planned to be general, but I think non x-y is now better treated specially, for various reasons - but it seems unjoin still wins on that front). So, a non-dplyr new_meta <- function(crs) {
tibble::tibble(crs = crs, ctime = Sys.time())
}
new_PATH0 <- function(vertex, object, index, crs = NA_character_) {
object[["path_"]] <- index
structure(list(object = object,
vertex = vertex,
meta = new_meta(crs)),
class = c("path0", "PATH0", "sc"))
}
library(silicate)
#>
#> Attaching package: 'silicate'
#> The following object is masked from 'package:stats':
#>
#> filter
library(sfheaders)
path0 <- function(x) {
df <- sf_to_df(x)
## vertex_ is "unique coord"
#df[["vertex_"]] <- as.integer(factor(paste(df[["y"]], df[["x"]], sep = "|")))
df[["vertex_"]] <- dplyr::group_indices(df, .data$x, .data$y)
## the vertex table, separated out (we need vertex_ to remap)
v <- df[!duplicated(df[["vertex_"]]), c("x", "y", "vertex_")]
## now remap (can this be done better?)
## (it's the single most confusing thing I think ... and
## I ended up relying on unjoin)
df[["vertex_"]] <- match(df$vertex_, v$vertex_)
## cleanup
v[["vertex_"]] <- NULL
df[["x"]] <- NULL
df[["y"]] <- NULL
## feature id, object_, whatever the top level "keep" table has
o <- df[!duplicated(df["sfg_id"]), "sfg_id", drop = FALSE]
## path_ just for compatibility with current silicate
df[["path_"]] <- df[["linestring_id"]]
## only multipolygon
if ("multipolygon_id" %in% names(df)) {
df[["subobject"]] <- df[["polygon_id"]]
} else {
df[["subobject"]] <- 1
}
## the embeded indexes of the sequential paths, with split.data.frame
idx <- split(df[c("vertex_", "path_", "subobject")], df[["sfg_id"]])
names(v) <- c("x_", "y_") ## and these, compat with current silicate
df$sfg_id <- NULL
crs <- NA_character_
new_PATH0(v, o, idx, crs = crs)
}
plot(path0(inlandwaters)) # plot(path0(sf::st_cast(inlandwaters, "POLYGON")))
# plot(path0(sf::st_cast(inlandwaters, "MULTILINESTRING")))
plot(path0(minimal_mesh)) #plot(path0(sf::st_sf(g = sf::st_sfc(sfzoo$multipolygon))))
#plot(TRI0(path0(minimal_mesh)), col = c("grey", "firebrick"), border = "black")
## etc.
rbenchmark::benchmark(path0(inlandwaters),
PATH0(inlandwaters), replications = 20)
#> test replications elapsed relative user.self sys.self
#> 1 path0(inlandwaters) 20 1.99 1.137 1.83 0.15
#> 2 PATH0(inlandwaters) 20 1.75 1.000 1.61 0.15
#> user.child sys.child
#> 1 NA NA
#> 2 NA NA Created on 2020-04-29 by the reprex package (v0.3.0) (Trust you don't mind me using this issue for reams of examples, it's helpful for me to air it in this way). |
Exactly what I need too. |
One key part of the design of silicate that is missing above in Then, models like |
A kind of progress report: I'm in the middle of a I'm hoping now to at least cleanup the way this is done, with all the index juggling that's grown organically - into something easier for us to discuss. |
If it helps, I'm working on an update to |
This has been so worthwhile, faster, easier: new_TRI0 <- function(vertex, object, index, crs = NA_character_, meta = NULL) {
meta1 <- tibble::tibble(proj = crs, ctime = Sys.time())
if (!is.null(meta)) {
meta <- rbind(meta1, meta)
}
object[["topology_"]] <- index
structure(list(object = object, vertex = vertex,
meta = meta), class = c("TRI0", "sc"))
}
## build TRI0 with sfheaders
tri0 <- function(x, ...) {
df <- sfheaders::sf_to_df(x)
crs <- crsmeta::crs_proj(x)
if (length(grep("POLYGON", class(x[[attr(x, "sf_column")]]) )) < 1) {
stop("only polygon")
}
x[[attr(x, "sf_column")]] <- NULL
object <- tibble::as_tibble(x)
object$object_ <- 1:nrow(object)
## deduplicate in xy
df[["vertex_"]] <- dplyr::group_indices(df, .data$x, .data$y)
## the vertex table, separated out (we need vertex_ to remap)
v <- df[!duplicated(df[["vertex_"]]), c("x", "y", "vertex_")]
## now remap (can this be done better?)
## (alt. is unjoin())
df[["vertex_"]] <- match(df$vertex_, v$vertex_)
df[["path_"]] <- as.integer(factor(df[["linestring_id"]]))
## cleanup
v[["vertex_"]] <- NULL
df[["x"]] <- NULL ## not really necessary to remove but highights the point
df[["y"]] <- NULL ## that these are now indexed in 'v'
featurelist <- split(df, df$sfg_id)
feature_triangles <- vector("list", length(featurelist))
for (i in seq_along(featurelist)) {
polylist <- split(featurelist[[i]],
featurelist[[i]]$polygon_id)
polytriangles <- vector("list", length(polylist))
for (j in seq_along(polylist)) {
## j == 1 if POLYGON
polygon <- polylist[[j]]
## in the past it was less clear to me how this
## coordinate table relates to the vertex table (much simpler now)
holes <- which(c(0, abs(diff(polygon$linestring_id))) > 0)
if (length(holes) < 1) {
holes <- 0
}
xy <- cbind(v[["x"]][polygon$vertex_],
v[["y"]][polygon$vertex_])
tris <- matrix(decido::earcut(xy, holes), ncol = 3L, byrow = TRUE)
polytriangles[[j]] <- tibble::tibble(.vx0 = polygon$vertex_[tris[,1L]],
.vx1 = polygon$vertex_[tris[,2L]],
.vx2 = polygon$vertex_[tris[,3L]],
path_ = polygon$path_[1L])
#decido::plot_ears(xy, trindex)
#Sys.sleep(0.1)
}
feature_triangles[[i]] <- do.call(rbind, polytriangles)
}
names(v) <- c("x_", "y_")
new_TRI0(v, object, feature_triangles, crs = crs)
}
library(silicate)
#>
#> Attaching package: 'silicate'
#> The following object is masked from 'package:stats':
#>
#> filter
plot(tri0(inlandwaters)) ## reasonable speed up
rbenchmark::benchmark(tri0(inlandwaters), TRI0(inlandwaters),
replications = 7)
#> test replications elapsed relative user.self sys.self
#> 1 tri0(inlandwaters) 7 3.22 1.000 3.17 0.03
#> 2 TRI0(inlandwaters) 7 10.79 3.351 10.50 0.28
#> user.child sys.child
#> 1 NA NA
#> 2 NA NA
nc <- sf::read_sf(system.file("gpkg/nc.gpkg", package = "sf", mustWork = TRUE))
plot(tri0(nc)) Created on 2020-05-04 by the reprex package (v0.3.0) |
It doesn't help here because having an index into unique coordinates is really key to the current structure of the silicate models - but it definitely helps in general for flexibility! (The edge-based triangulation was where I started this whole thing - and the automatic topology you get from shared vertices was key - so it's in there as a core even though it doesn't always need to be. Ear-cutting was a step back from RTriangle's license requirement, so that silicate could triangulate on its own, and it's also very different - path vs. edge triangulating for polys). |
Here's SC0, no speed up but comparable. The key to speed in this was using The group_indices stuff maintains path ID (i.e. linestring_id) but it doesn't record the parent polygon_id or multilinestring id - I think it otherwise works for any line or polygon input. new_SC0 <- function(vertex, object, index, crs = NA_character_, meta = NULL) {
meta1 <- tibble::tibble(proj = crs, ctime = Sys.time())
if (!is.null(meta)) {
meta <- rbind(meta1, meta)
}
object[["topology_"]] <- index
structure(list(object = object, vertex = vertex,
meta = meta), class = c("SC0", "sc"))
}
## build SC0 with sfheaders
sc0 <- function(x, ...) {
df <- sfheaders::sf_to_df(x)
crs <- crsmeta::crs_proj(x)
x[[attr(x, "sf_column")]] <- NULL
object <- tibble::as_tibble(x)
object$object_ <- 1:nrow(object)
## deduplicate in xy
df[["vertex_"]] <- dplyr::group_indices(df, .data$x, .data$y)
## the vertex table, separated out (we need vertex_ to remap)
v <- df[!duplicated(df[["vertex_"]]), c("x", "y", "vertex_")]
## now remap (can this be done better?)
## (alt. is unjoin())
df[["vertex_"]] <- match(df$vertex_, v$vertex_)
## cleanup
v[["vertex_"]] <- NULL
df[["x"]] <- NULL ## not really necessary to remove but highlights the point
df[["y"]] <- NULL ## that these are now indexed in 'v'
## a global linestring_id
if ("multipolygon_id" %in% names(df)) {
df[["path_"]] <- dplyr::group_indices(df, .data$sfg_id,
.data$polygon_id,
.data$linestring_id)
} else {
df[["path_"]] <- dplyr::group_indices(df, .data$sfg_id,
.data$linestring_id)
}
featurelist <- split(df, df$sfg_id)
feature_segments <- vector("list", length(featurelist))
.path2seg <- function(x, pathid = NULL) {
cbind(.vx0 = x[-length(x)], .vx1 = x[-1L], path_ = pathid)
}
for (i in seq_along(featurelist)) {
segments <- lapply(split(featurelist[[i]][c("vertex_", "path_")],
featurelist[[i]]$path_),
function(lstring) .path2seg(lstring[["vertex_"]],
pathid = lstring[["path_"]][1L]))
feature_segments[[i]] <- tibble::as_tibble(do.call(rbind, segments))
}
names(v) <- c("x_", "y_")
new_SC0(v, object, feature_segments, crs = crs)
}
library(silicate)
#>
#> Attaching package: 'silicate'
#> The following object is masked from 'package:stats':
#>
#> filter
plot(sc0(inlandwaters))
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
suppressWarnings(plot(sc0(st_cast(inlandwaters, "POLYGON")))) suppressWarnings(xx <- sc0(st_cast(st_cast(inlandwaters, "POLYGON"), "LINESTRING")))
print(xx)
#> class : SC0
#> type : Structural
#> vertices : 30835 (2-space)
#> primitives : 33455 (2-space)
#> crs : NA
xx$object$color_ <- viridis::viridis(nrow(xx$object))
plot(xx) rbenchmark::benchmark(sc0(inlandwaters), SC0(inlandwaters),
replications = 7)
#> test replications elapsed relative user.self sys.self user.child
#> 1 sc0(inlandwaters) 7 0.96 1.032 0.88 0.08 NA
#> 2 SC0(inlandwaters) 7 0.93 1.000 0.89 0.02 NA
#> sys.child
#> 1 NA
#> 2 NA
nc <- sf::read_sf(system.file("gpkg/nc.gpkg", package = "sf", mustWork = TRUE))
nc$color_ <- rainbow(nrow(nc), alpha = 0.6)
plot(sc0(nc)) Created on 2020-05-04 by the reprex package (v0.3.0) |
The non-dplyr |
Here I try direct from file so we can control GDAL directly to return WKB, then unpack with @paleolimbot's wk! (we need a bit of a group chat about column names ?) #remotes::install_github("paleolimbot/wk")
library(wk)
library(vapour)
file <- "list_locality_postcode_meander_valley.tab"
layer <- "list_locality_postcode_meander_valley"
## A MapInfo TAB file with polygons
f <- system.file(file.path("extdata/tab", file), package="vapour")
g <- tibble::tibble(g = vapour_read_geometry(f))
dim(wk::wkb_coords(g$g))
#> [1] 55387 9
new_TRI0 <- function(vertex, object, index, crs = NA_character_, meta = NULL) {
meta1 <- tibble::tibble(proj = crs, ctime = Sys.time())
if (!is.null(meta)) {
meta <- rbind(meta1, meta)
}
object[["topology_"]] <- index
structure(list(object = object, vertex = vertex,
meta = meta), class = c("TRI0", "sc"))
}
## build TRI0 with wk and vapour (along the lines of `tri0()` above but direct via GDAL)
tri_file <- function(x, ...) {
df <- wk::wkb_coords(vapour::vapour_read_geometry(x, ...))
df$linestring_id <- df$ring_id
df$polygon_id <- df$part_id
df$sfg_id <- df$feature_id
if (anyNA( df$polygon_id)) {
df$polygon_id[is.na(df$polygon_id)] <- 0
}
#crs <- crsmeta::crs_proj(x)
object <- tibble::tibble(object_ = seq_len(length(unique(df$feature_id))))
## deduplicate in xy
df[["vertex_"]] <- dplyr::group_indices(df, .data$x, .data$y)
## the vertex table, separated out (we need vertex_ to remap)
v <- df[!duplicated(df[["vertex_"]]), c("x", "y", "vertex_")]
## now remap (can this be done better?)
## (alt. is unjoin())
df[["vertex_"]] <- match(df$vertex_, v$vertex_)
df[["path_"]] <- as.integer(factor(df[["linestring_id"]]))
## cleanup
v[["vertex_"]] <- NULL
df[["x"]] <- NULL ## not really necessary to remove but highights the point
df[["y"]] <- NULL ## that these are now indexed in 'v'
featurelist <- split(df, df$sfg_id)
feature_triangles <- vector("list", length(featurelist))
for (i in seq_along(featurelist)) {
polylist <- split(featurelist[[i]],
featurelist[[i]]$polygon_id)
polytriangles <- vector("list", length(polylist))
for (j in seq_along(polylist)) {
## j == 1 if POLYGON
polygon <- polylist[[j]]
## in the past it was less clear to me how this
## coordinate table relates to the vertex table (much simpler now)
holes <- which(c(0, abs(diff(polygon$linestring_id))) > 0)
if (length(holes) < 1) {
holes <- 0
}
xy <- cbind(v[["x"]][polygon$vertex_],
v[["y"]][polygon$vertex_])
tris <- matrix(decido::earcut(xy, holes), ncol = 3L, byrow = TRUE)
polytriangles[[j]] <- tibble::tibble(.vx0 = polygon$vertex_[tris[,1L]],
.vx1 = polygon$vertex_[tris[,2L]],
.vx2 = polygon$vertex_[tris[,3L]],
path_ = polygon$path_[1L])
#decido::plot_ears(xy, trindex)
#Sys.sleep(0.1)
}
feature_triangles[[i]] <- do.call(rbind, polytriangles)
}
names(v) <- c("x_", "y_")
new_TRI0(v, object, feature_triangles, crs = NA_character_)
}
library(silicate)
#>
#> Attaching package: 'silicate'
#> The following object is masked from 'package:stats':
#>
#> filter
## now we can leverage vapour skip_n/limit_n/sql etc.
plot(tri_file(f, limit_n = 10)) plot(tri_file(f, sql = glue::glue("SELECT * FROM {layer} WHERE FID < 10"))) xx <- tri_file(f)
plot(xx, border = "grey") xx$object$color_ <- viridis::viridis(nrow(xx$object))
anglr::mesh_plot(xx)
#anglr::plot3d(xx) ## etc Created on 2020-05-06 by the reprex package (v0.3.0) |
Cool! My topology game is not great, but I think you could define a C++ Happy to change the column names to make the coord function play nicer with this! |
There's definitely no "one right way", and your nest column reminds me that we haven't covered enough bases. I don't mind changing names at all but the NA fill seems unnecessary? Thinking aloud, many functions that do a variety of encodings is important - but what about the two table approach as a core? One table is just geometry (xy or xyz etc) the other the run length of each linestring. Then you can have many classifiers at low storage cost, or eventually as virtual ALTREP columns like the vctrs rle stuff. |
Is that similar in concept to the prototype here? where |
Yes exactly! silicate::sc_path(sf)
# A tibble: 18,286 x 6
ncol type object_ subobject path_ ncoords_
<int> <chr> <chr> <dbl> <chr> <int>
1 2 LINESTRING zXpe9i 1 owja2l 20
2 2 LINESTRING KXeft1 1 saFXj6 26
3 2 LINESTRING aghWBa 1 5OJWj4 16
4 2 LINESTRING S58Z1Q 1 RUORz7 8
...
# ... with 18,276 more rows
silicate::sc_coord(sf)
# A tibble: 57,757 x 2
x_ y_
<dbl> <dbl>
1 145. -37.8
... (the IDs in sc_path are globally unique because reasons, but that leverages gibble to get the raw mapping). |
|
I chose those classifiers because they are "interpretation independent", only "path_" has a special meaning analogous to "linestring". The idea then was that the "type" nominates how you interpret things, for multilinestring you forget subobject and you just have all path_ within feature, with multipolygon the subobject controls the extra level. Global IDs allow to subset arbitrarily but I think that's less important than this more fundamental common stuff. |
then I think now my understanding is finally where yours was at a few years ago! This is also similar to the discussions around Arrow and GeoPandas. The big difference with silicate, as you point out, is the unique-ness. |
Yes that is probably overdone in hindsight because of where I started. All good, exciting times! |
My understanding is still way behind...I did make a When writing my coords thinger I was wondering if it would be more useful to have |
You're right it's absolutely not. I'm just really deep into it as a workaround for two in-memory formats. With all these new approaches it'll fade away because we have flexibility to cut to the chase 🤗 |
And yes to those functions that's precisely the right flexibility! |
I'm leaning more and more towards a simpler model at the base of silicate, this is new_idxTRI0 <- function(coord, object, index, crs = NA_character_, meta = NULL) {
meta1 <- tibble::tibble(proj = crs, ctime = Sys.time())
if (!is.null(meta)) {
meta <- rbind(meta1, meta)
}
object[["topology_"]] <- index
structure(list(object = object, coord = coord,
meta = meta), class = c("idxTRI0", "sc"))
}
# pure metal index
idxTRI0 <- function(x, ...) {
df <- sfheaders::sf_to_df(x)
df$coord_ <- seq_len(dim(df)[1L])
crs <- crsmeta::crs_proj(x)
if (length(grep("POLYGON", class(x[[attr(x, "sf_column")]]) )) < 1) {
stop("only polygon")
}
x[[attr(x, "sf_column")]] <- NULL
object <- tibble::as_tibble(x)
object$object_ <- 1:nrow(object)
featurelist <- split(df, df$sfg_id)
feature_triangles <- vector("list", length(featurelist))
for (i in seq_along(featurelist)) {
polylist <- split(featurelist[[i]],
featurelist[[i]]$polygon_id)
polytriangles <- vector("list", length(polylist))
for (j in seq_along(polylist)) {
polygon <- polylist[[j]]
holes <- which(c(0, abs(diff(polygon$linestring_id))) > 0)
if (length(holes) < 1) {
holes <- 0
}
xy <- polygon[c("x", "y")]
tris <- matrix(decido::earcut(xy, holes), ncol = 3L, byrow = TRUE)
polytriangles[[j]] <- cbind(.vx0 = polygon$coord_[tris[,1L]],
.vx1 = polygon$coord_[tris[,2L]],
.vx2 = polygon$coord_[tris[,3L]])
}
feature_triangles[[i]] <- do.call(rbind, polytriangles)
}
coords <- df[c("x", "y")]
names(coords) <- c("x_", "y_")
new_idxTRI0(coords, object, feature_triangles, crs = crs)
} There's no print, plot, or any support in silicate yet (the 'coord' - not 'vertex' - table is a break from every other model, so But it's compelling:
Given everything I've learnt recently, the de-duplication has been unnecessary a lot of the time (it's required for constrained triangulation with RTriangle where I started, and it'll always be necessary for high quality triangulation - and all the topology stuff requires it, but do when necessary is I think better). |
Leaving this code here while I experiment library(Rcpp)
cppFunction(
depends = c("decido", "sfheaders")
, include = c(
'#include "decido/decido.hpp"'
, '#include "sfheaders/df/sf.hpp"'
)
, code = '
SEXP tris( Rcpp::DataFrame sf ) {
std::string geom_col = sf.attr("sf_column");
Rcpp::List sfc = sf[ geom_col ];
Rcpp::DataFrame df = sfheaders::df::sf_to_df( sf, false );
Rcpp::NumericVector x = df["x"];
Rcpp::NumericVector y = df["y"];
R_xlen_t i;
R_xlen_t n = sfc.length();
Rcpp::List res( n );
R_xlen_t total_rows = 0; // for keeping track of final number of coordinates
for( i = 0; i < n; ++i ) {
SEXP polygon = sfc[ i ];
Rcpp::IntegerVector idx = decido::api::earcut( polygon );
Rcpp::NumericVector xx = x[ idx ];
Rcpp::NumericVector yy = y[ idx ];
total_rows = total_rows + xx.length();
res[ i ] = Rcpp::List::create(
Rcpp::_["x"] = xx,
Rcpp::_["y"] = yy
);
}
// TODO - collapse_list is moving to geometrires::
Rcpp::List coords = sfheaders::df::collapse_list( res, total_rows );
return Rcpp::List::create(
Rcpp::_["data"] = sf, // TODO: remove geometry
Rcpp::_["coord"] = coords
);
}
'
)
sf <- sfheaders::sf_cast( silicate::inlandwaters, "POLYGON" )
tri1 <- idxTRI0(sf)
tri2 <- tris( sf )
microbenchmark::microbenchmark(
sc = { idxTRI0(sf) },
cpp = { tris( sf ) },
times = 25
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# sc 82.55239 84.01620 85.60736 84.72283 86.78654 91.75369 25
# cpp 15.44566 16.09095 17.82840 17.75326 19.33051 20.44533 25 |
still experimenting - this time without library(Rcpp)
cppFunction(
depends = c("decido", "geometries")
, include = c(
'#include "decido/decido.hpp"'
, '#include "geometries/coordinates/coordinates_impl.hpp"'
)
, code = '
SEXP tris( Rcpp::DataFrame sf ) {
std::string geom_col = sf.attr("sf_column");
Rcpp::List sfc = sf[ geom_col ];
Rcpp::DataFrame df = geometries::coordinates::coordinates_impl( sfc );
Rcpp::NumericVector x = df["c1"];
Rcpp::NumericVector y = df["c2"];
R_xlen_t i;
R_xlen_t n = sfc.length();
Rcpp::List res( n );
R_xlen_t total_rows = 0; // for keeping track of final number of coordinates
for( i = 0; i < n; ++i ) {
SEXP polygon = sfc[ i ];
Rcpp::IntegerVector idx = decido::api::earcut( polygon );
Rcpp::NumericVector xx = x[ idx ];
Rcpp::NumericVector yy = y[ idx ];
total_rows = total_rows + xx.length();
res[ i ] = Rcpp::List::create(
Rcpp::_["x"] = xx,
Rcpp::_["y"] = yy
);
}
Rcpp::List coords = geometries::utils::collapse_list( res, total_rows );
return Rcpp::List::create(
Rcpp::_["data"] = sf, // TODO: remove geometry
Rcpp::_["coord"] = coords
);
}
'
)
sf <- sfheaders::sf_cast( silicate::inlandwaters, "POLYGON" )
tri1 <- idxTRI0(sf)
tri2 <- tris( sf )
microbenchmark::microbenchmark(
sc = { idxTRI0(sf) },
cpp = { tris( sf ) },
times = 25
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# sc 82.92683 84.99317 88.46958 89.41726 91.01554 93.91182 25
# cpp 15.29395 15.54418 17.06758 15.93848 16.80718 22.00044 25 |
@dcooley just FYI, this is related as the "packed SpatVector" in the terra package uses a run-length encode for vector objects (much like what i.e. see "index" in library(terra)
f <- system.file("ex/lux.shp", package="terra")
v <- vect(f)
str(pack(v))
Formal class 'PackedSpatVector' [package "terra"] with 5 slots
..@ type : chr "polygons"
..@ crs : chr "GEOGCRS[\"WGS 84\",\n DATUM[\"World Geodetic System 1984\",\n ELLIPSOID[\"WGS 84\",6378137,298.257223"| __truncated__
..@ coordinates: num [1:3995, 1:2] 6.03 6.03 6.04 6.04 6.04 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "x" "y"
..@ index : num [1:12, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:4] "id" "part" "hole" "start"
..@ attributes :'data.frame': 12 obs. of 5 variables:
.. ..$ ID_1 : num [1:12] 1 1 1 1 1 2 2 2 3 3 ...
.. ..$ NAME_1: chr [1:12] "Diekirch" "Diekirch" "Diekirch" "Diekirch" ...
.. ..$ ID_2 : num [1:12] 1 2 3 4 5 6 7 12 8 9 ...
.. ..$ NAME_2: chr [1:12] "Clervaux" "Diekirch" "Redange" "Vianden" ...
.. ..$ AREA : num [1:12] 312 218 259 76 263 188 129 210 185 251 ...
I find it a terrible shame that there's no community-consistency around use of this kind of representation, but obviously several folks have settled on something that could be pretty standard so there's enough to tell a story here, I'll try to do that. |
To support some stuff in S2 I needed a coordinates -> WK converter, and it turns out it's really fast! I haven't looked at the source code of sfheaders or geometries closely enough to know what the differences are (or maybe my benchmark is measuring something different). This also doesn't do multi* geometries yet. library(wk) # remotes::install_github("paleolimbot/wk")
states_df <- ggplot2::map_data("state")
states_df$region <- factor(states_df$region)
bench::mark(
wk = coords_polygon_translate_wkb(
states_df$long,
states_df$lat,
feature_id = states_df$region,
ring_id = states_df$group
),
wk_sxp = coords_polygon_translate_wksxp(
states_df$long,
states_df$lat,
feature_id = states_df$region,
ring_id = states_df$group
),
sfheaders = sfheaders::sfc_polygon(
states_df,
x = "long",
y = "lat",
polygon_id = "region",
linestring_id = "group"
),
check = FALSE
)[1:5]
#> # A tibble: 3 x 5
#> expression min median `itr/sec` mem_alloc
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt>
#> 1 wk 970µs 1.25ms 800. 1.26MB
#> 2 wk_sxp 749.1µs 1.01ms 987. 1.15MB
#> 3 sfheaders 12.3ms 12.78ms 77.2 1018.67KB Created on 2020-06-21 by the reprex package (v0.3.0) |
yeah the polygon performance is a known limitation, and I've also been working on improving it, whilst also keeping the function generic so it will work on any geometry / level of nesting in the result. The splitting of the data is the easy bit I think, and I've added it as a comparison to your benchmark microbenchmark::microbenchmark(
wk = { coords_polygon_translate_wkb(
states_df$long,
states_df$lat,
feature_id = states_df$region,
ring_id = states_df$group
)
},
wk_sxp = { coords_polygon_translate_wksxp(
states_df$long,
states_df$lat,
feature_id = states_df$region,
ring_id = states_df$group
)
},
geom = {
geometries:::rcpp_nested_rleid(
states_df
, c( 4, 5 )
)
}
)
# Unit: microseconds
# expr min lq mean median uq max neval
# wk 645.516 695.6870 968.0927 712.8785 1011.287 8657.403 100
# wk_sxp 437.039 479.7635 736.2450 496.5695 569.168 12404.027 100
# geom 714.097 748.2310 1047.1301 772.6895 920.459 9118.749 100 It still needs a bit of work to package the output, but hopefully I'll end up with one function which can create any geometry shape. |
Cool! Creating any geometry shape sounds really cool. When you've got your geometry format down I'd love to write wk readers and writers to support them! |
Another update - I've got a general "make geometry" function, which will put matrices inside lists based on the number of I'm going to add in a Updated benchmark library(wk)
states_df <- ggplot2::map_data("state")
states_df$region <- factor(states_df$region)
microbenchmark::microbenchmark(
wk = {
wk <- coords_polygon_translate_wkb(
states_df$long,
states_df$lat,
feature_id = states_df$region,
ring_id = states_df$group
)
},
wk_sxp = {
wk_sxp <- coords_polygon_translate_wksxp(
states_df$long,
states_df$lat,
feature_id = states_df$region,
ring_id = states_df$group
)
},
geom = {
g <- geometries:::rcpp_nested_rleid(
states_df
, c( 4, 5 )
, c( 0, 1 )
)
}
)
# Unit: microseconds
# expr min lq mean median uq max neval
# wk 682.953 702.714 915.5185 714.7985 761.6645 8644.427 100
# wk_sxp 463.638 487.296 627.3311 496.5690 520.1725 10669.165 100
# geom 527.690 584.324 915.1112 593.5455 623.5320 11102.187 100
> g[[8]]
[[1]]
[,1] [,2]
[1,] -77.13731 38.94394
[2,] -77.06283 38.99551
[3,] -77.01699 38.97259
[4,] -76.93105 38.89238
[5,] -77.05136 38.80643
[6,] -77.05136 38.82935
[7,] -77.06283 38.86373
[8,] -77.07428 38.88664
[9,] -77.09720 38.90956
[10,] -77.13731 38.94394
> wk_sxp[[8]]
[[1]]
[,1] [,2]
[1,] -77.13731 38.94394
[2,] -77.06283 38.99551
[3,] -77.01699 38.97259
[4,] -76.93105 38.89238
[5,] -77.05136 38.80643
[6,] -77.05136 38.82935
[7,] -77.06283 38.86373
[8,] -77.07428 38.88664
[9,] -77.09720 38.90956
[10,] -77.13731 38.94394
attr(,"class")
[1] "wk_polygon" |
That's genius! Absolutely the right way to parameterize this on the C++ end. Switching on geometry type is a pain, which is why there are no multi* parsers in wk. |
@mdsumner - I figured it would be good to sketch out designs / requirements for supporting scilicate / hypertidy structures and the like.
I've started porting all the utility functions from
sfheaders
into here. I've added a few examples to the readme, and I think some of these will be applicable to your requirements (like thecoordinate_indices()
andid_positions()
).so, taking your split.data.frame requirement, what object would this operate on, and what does it give as an output?
TODO
The text was updated successfully, but these errors were encountered: