Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge multipolygons with touching inner rings #30

Open
mpadge opened this issue Jan 17, 2017 · 14 comments
Open

Merge multipolygons with touching inner rings #30

mpadge opened this issue Jan 17, 2017 · 14 comments
Assignees

Comments

@mpadge
Copy link
Member

mpadge commented Jan 17, 2017

These are permitted in OSM yet not by SF. osmdata.cpp needs to be modified to merge any touching inner rings into single rings. See also this extended OSM discussion.

@Robinlovelace
Copy link
Member

I must confess I don't fully understand what this means.

@mdsumner
Copy link

mdsumner commented Jul 9, 2017

It means that the two polygons in this first figure cannot be a single "feature", for example (left hand panel only):

image

If those two polygons were "unioned" the shared edge would have to go (a lossy process), unless the two sub-features were stored within a GEOMETRYCOLLECTION.

@Robinlovelace
Does that help?

@mdsumner
Copy link

mdsumner commented Jul 9, 2017

(Gee that OSM wiki page is good, I've never found this explained properly visually before. )

@Robinlovelace
Copy link
Member

Thanks for the clarification @mdsumner. Do you have code that shows how this failing with sf. Would be the final step in me getting this I think - from the OSM wiki there are a number of edge cases that fail it seems.

@mdsumner
Copy link

mdsumner commented Jul 9, 2017

No I don't, I would not expect sf to do that level of checking on creation, but it might now given that GEOS in built -in. I can craft a dummy example but I'll follow up if there's a known case using osm.

@mdsumner
Copy link

mdsumner commented Jul 9, 2017

Here's an example, not the minimal possible but easy to understand I think:

tfile <- sprintf("%s.rda", tempfile())
download.file("https://github.com/mdsumner/scsf/raw/master/data/minimal_mesh.rda", tfile, mode = "wb")
load(tfile)
library(sf)

inner_ring_touching <- st_sf(a = 1, geometry = st_sfc(st_multipolygon(unlist(lapply(st_geometry(minimal_mesh), unclass), recursive = FALSE))))

#plot(inner_ring_touching)
st_is_valid(inner_ring_touching)
#[1] FALSE
#Warning message:
#In evalq((function (..., call. = TRUE, immediate. = FALSE, noBreaks. = FALSE,  :
#  Self-intersection at or near point 0.68999999999999995 0

can we fix it?

valid_mp <- st_buffer(inner_ring_touching, dist = 0)
#plot(valid_mp)
st_is_valid(valid_mp)
#[1] TRUE

The zero width buffer thing is a bit weird, but it's a "well-known" hack. To detect the invalidity and remove it you need to decompose to segments and determine if any edges are repeated in one feature (they would pass each other in different directions in this case) - the zero-width buffer op does this and then gives back a simple feature after the offending edge is detected and the rings reconstructed as a single path. Many tools work like this, a rich graph of the underlying topology is constructed, analysed, discarded in the underlying libs and then out comes a simplified summary.

I have no idea if this "fix" would be useful or damaging in the osmdata context, but I'm happy to help figure that out.

@mpadge
Copy link
Member Author

mpadge commented Jul 10, 2017

That's awesome @mdsumner! This issue is actually pretty important, because we want osmdata to be maximally compliant with OSM itself, and this was one of the few notable cases in which sf/sp prevented that. It was nevertheless bumped fairly low on my priority status, so i'm mighty pleased that you've just conjured up a grand first cut. 🥇

... only problem remains achieving this without direct Imports: sf, but that shouldn't be too tricky.

@mdsumner
Copy link

mdsumner commented Jul 12, 2017

It's not so easy - I think! - but I've worked on each component step for different reasons in the past. The primitive model in sc is the least developed, I get stuck on how to keep the segment vertex pairs, whether to record their path-grouping, keep track of unique directed vs. undirected segments etc.

It will happily build the segment graph from sf:

# devtools::install_github("mdsumner/scsf")
library(scsf)
library(sc)
prim <- PRIMITIVE(inner_ring_touching)

## which segments have to go? find where interchanging vertex pairs is not unique
prim$segment$useg <-  apply(cbind(prim$segment$.vertex0, prim$segment$.vertex1), 1, function(x) paste(sort(x), collapse = "-"))
prim$segment$bad <- duplicated(prim$segment$useg)

## now we have to piece them back together, we should work on just the feature
## with the touching paths so we'd have to identify those first 
prim$segment %>% filter(!bad)

## then we need to trace around the vertex pairs to figure out the boundary cycle and reconstruct it
## in place of the original two paths

I've definitely done each of these things before in small part, it will be simpler to use sc::PRIMITIVE per feature rather than per-layer, as it's not neighbouring features you care about. Having tools for segments versus parts is very high on my agenda at the moment, I'm still searching around for the right approach but I think there's a lot to be gained here - a set of component tools for pulling things apart and putting them back together just so.

I imagine we could use igraph directly for some of this, it's not so hard to go to tidygraph and igraph from here, but I need more understanding of igraph generally to see how to drive it better.

@mdsumner
Copy link

mdsumner commented Jul 16, 2017

I managed to wade through the various pieces I'd experimented with previously and put together some examples that almost finish the task. The first re-constructs the rings of the example above by identifying the offending segment and then finding "cycles" in the segments that remain, identifying and isolating their vertex path.

http://rpubs.com/cyclemumner/291559

This one simply replaces the simple data set with a more complex one and does the same, to show that it works on non-trivial cases.

http://rpubs.com/cyclemumner/291560

The code is ugly, and it "does my head in" because of the constant juggling of relational "tidy" ID matching with structural-index "position in array" matching. But it works, and I'm happy as it brings together quite a few things I've learnt in recent times.

I've used ggplot to plot it because I'm not quite ready to think about the per-feature aspect, but this also has problems with "holes require polypath, but also cannot also carry further group aesthetics" - which is a general limitation of ggplot2, and is related to why sf exists now (and recent updates to leaflet) to remove the "hole/part" ambiguity of multi-polygons.

The second one really highlights the need for this to be done per feature, because we otherwise will have no ability to determine where the holes belong. So there's a reasonable "to-do" list from my perspective, but I hope this shows the promise of what we could achieve with some tools to deal with segment graphs and the higher level groupings we now take for granted in the tidyverse. I really think that igraph and tidygraph will help enormously here too - I'm sure there's a find-cycles ability in igraph, but I just haven't quite been able to apply it in full - andt we need some more attention on their application to "spatial" stuff.

@mdsumner
Copy link

mdsumner commented Jul 16, 2017

I got this in better shape, tested quite a few cases. I re-made some lovely mistakes and so learnt heaps. In short it's straightforward to do it in R, to decompose to segments, remove those that have an intersecting segment heading in the opposite direction (and that trip the GEOS validity alarm), then reconstruct the remaining valid ring by tracing around vertices. I've got it spread across two packages because of other plans but it's simple enough to implement independently without much hassle.

@mpadge
Copy link
Member Author

mpadge commented Jul 16, 2017

This is awesome stuff - thanks so much for your great work here Mike! I haven't had time to do anything about it in the last week, but will be on it asap now ...

@mdsumner
Copy link

mdsumner commented Jul 17, 2017

I've become a bit obsessed with this - it's been a dusty corner on the edge of my thinking for a while, and I've written a faster version using the "datastructures" package, but I'm still doing something wrong there. That cycles() function above won't scale and we'll need something much better. (Using datastructures::hashmap is faster, but I've found examples that fail for some reasons, you could probably see a way to do it easily :).

I also realize that this is exactly what raster::rasterToPolygons does with dissolve=TRUE via rgeos via sp::aggregate. It could be faster than dissolve to use rgeos::gBuffer(x, width = 0) after flattening all the polygon lists together, but what that does is exactly what we are doing, a version of chaining external edges into paths from the segment graph.

Do we have a concrete example where this inner boundary thing is a problem from osmdata?

(I've also realized now why Manifold don't disallow "internal boundaries", it's because they fundamentally always use a constrained triangulation for geometry tests, so they completely avoid the pathologies mentioned in that OSM discussion - because they aren't worried about orientation to an edge, they test intersection with simpler elements that are grouped into larger ones. A ha). If I can get this segment path thing sorted out it's a nice bridge between the finite elements approach and the path based approach to shapes.

@Robinlovelace
Copy link
Member

Looks like some something that could be useful beyond osmdata. Great work by the looks of it!

@mpadge
Copy link
Member Author

mpadge commented Jul 18, 2017

@mdsumner I reiterate: This is such awesome and fundamentally important stuff you're developing here. My promised time to actively work on it is still a day or two away, but note in the meantime that the fabulous ggm::fundCycles() routine does exactly what you need. You just need to convert your vertex matrix to a square matrix of binary neighbor connections, which is exactly what igraph::as_adj() does. That should at least be your scale-able solution there.

I must nevertheless say that at least having had the time to stick all of your analyses in my thinking cap for a few days actually disinclines me to implement this - the restriction here is SF (using my usual distinction of "SF" = OGC; "sf" = edzer), and what OSM allows is entirely sensible and more flexible. I'm very intrigued by implementing an additional osmdata_sc() function, in combination with concurrent development of the gladr issue, and ultimately to allow, encourage, and facilitate any and all of the kind of things you've been exploring here. Exciting stuff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants