What's the point? #1

SymbolixAU · 2020-01-13T23:30:43Z

Why does this library exist, what should it do?

tim-salabim · 2020-01-14T07:44:16Z

As I see it, a performant stand-alone (i.e. GDAL independent) driver for reading and especially writing flatgeobuf files from R makes sense because:

flatgeobuf makes serving/streaming large data to leaflet (and maybe mapdeck?) much more pleasant than waiting for the complete data to be loaded (map is created instantly and responsive even if data is still loading).
we can avoid the large (system) dependency of GDAL via sf/rgdal (not that I care too much personally, but others might e.g. when building something shiny that includes leaflet/mapdeck).
it helps to lure people away from shapefiles ;-)

The V8 based approach gets real slow real quick when writing large data. So my hope is that via C++ we can speed things up drastically.

dcooley · 2020-01-14T07:56:58Z

You make a convincing argument.

In terms of the design, these are what I consider the core functionality

export sf object to flatgeobuf for use in javascript libraries
read flatgeobuf and covert to sf object
read flatgeobuf and pipe directly to javascript libraries
read geojson and export to flatgeobuf

I've ordered these by what I think would be the priority to implement, with 3 & 4 possibly redundant, but may be nice.

tim-salabim · 2020-01-14T08:04:56Z

That sounds like a good plan. What about geometry only data, i.e. sfc?

SymbolixAU · 2020-01-14T20:36:52Z

yes I image we'd support sfg, sfc and sf.

I think the first step is to understand how serialize() works, and see if we can adapt it to work on an sf object (at the moment I think it's set up for a mapbox::geometry::feature object).

Then we need to pass the result geobuffer from R to javascript - have you done this in your leaflet demo, or did you read the buffer directly into the javascript library?

tim-salabim · 2020-01-15T06:41:30Z

I read it directly. The way this is currently implemented is:

copy fgb file to a tempfile (https://github.com/r-spatial/leafem/blob/master/R/file.R#L310-L314) - this is where the writing would happen if it was an sf object
attach tempfile to htmlwidget (https://github.com/r-spatial/leafem/blob/master/R/file.R#L446-L455)
on the JS side find the attachment (https://github.com/r-spatial/leafem/blob/master/inst/htmlwidgets/lib/FlatGeoBuf/fgb.js#L11)
fetch and stream the result onto the map (https://github.com/r-spatial/leafem/blob/master/inst/htmlwidgets/lib/FlatGeoBuf/fgb.js#L77-L78)

So as it is currently implemented. I would write the whole file once and then stream it in chunks onto the map. I am not sure if it is possible to stream directly from R to JS...

tim-salabim · 2020-01-15T06:54:10Z

Maybe @bjornharrtell can provide some input as to how serialize works and what would be the best path?

bjornharrtell · 2020-01-15T15:19:03Z

Indeed serialize is implemented as a reference/example and using mapbox types for feature, geometry and geojson representations because that's what I found most available/accessible as generic implementations of such things in the C++ world. It is also not a streaming API like I have in the reference JavaScript implementation. I'm not good enough with C++ to determine what a good streaming API and implementation should look like (see flatgeobuf/flatgeobuf#24).

Serialize simply takes a mapbox feature collection (a "table" of geometry and attribute data) and serializes into a flatgeobuf binary. This process involves writing a header, index and the features. The index should be optional but is not implemented yet (but is in the GDAL driver). The index makes it possible to do fast filtering on individual feature or spatial bounding box without reading the whole dataset but requires more time to write and cannot be serialized as a stream because it's a two pass process. The use case for having no index is when the the whole dataset is to be consumed at the client side without any filtering and can then be serialized as a stream.

It sounds like you really want to serialize from something else (I don't know what these sfg, sfc and sf are) and considering why you are doing this in the first place I would agree it's the way to go and might not even be more difficult to adapt the code to your types than trying to map your stuff to the mapbox types. If your types are general purpose in the R community and easily imported into my repository I could definitely consider adopting support for them in the reference C++ impl.

SymbolixAU · 2020-01-15T20:46:23Z

It sounds like you really want to serialize from something else (I don't know what these sfg, sfc and sf are)

Yes.
And our sf structures are very closely aligned with mapbox::geometry objects, they too are C++ structures. So I think serialising these is entirely possible and the way to go.

than trying to map your stuff to the mapbox types.

Given everything in the R world is also in the C++ world, the mapping is straight forward - done here (but I have a suspicion it can be improved by re-assigning the container type, rather than re-populating a new container)

From my first reading of serialize, my understanding is the bulk of it is creating vectors of each 'column' of the tabular structure, both the geometry coordinates and associated meta (which is what I do here in sfheaders).

From then on comes the magic, which I need to read through and understand what it's doing to construct the buf object.

And for some context

sfg (simple-feature geometry) == mapbox::geometry::
sfc (simple-feature collection) is a 'container' of sfgs
sf is each geometry with associated data (like a feature / feature collection).

And another aside, I'll ask the guys at deck.gl to consider supporting flatgeobuf directly too.

bjornharrtell · 2020-01-15T21:52:15Z

Reading geojson.h again makes me less than proud, I see it has several issues and shortcomings which has went unfixed because my attention went into the GDAL driver implementation. Some things to note:

fbb.FinishSizePrefixed(feature) is part of flatbuffers API, see http://google.github.io/flatbuffers/classflatbuffers_1_1_flat_buffer_builder.html#a7ba8462e408431054c99d25120326220. Note that in FlatGeobuf, the header is an individual flatbuffer and each feature is an individual flatbuffer.
I see that to get the final binary I concatenate vectors and copy into new buffer, so both inefficient and RAM hungry. (final copy done with memcpy at

flatgeobufr/src/geojson.h

Line 201 in df9a97a

memcpy(buf, data.data(), data.size());

)
The code creates a spatial tree index but it seems unused in the tests (and indeed I have no filtering support for deserialize) and it will be corrupt unless the input features are correctly sorted.

So, unfortunately I would say the C++ ref impl. is in pretty bad shape right now. I hope to be able to improve it in a not too far future.

bjornharrtell · 2020-01-20T19:44:35Z

With flatgeobuf/flatgeobuf#27 merged the flatgeobuf C++ ref impl. is now hopefully in a better shape. Naive non-streaming serialize and deserialize are still available and with improved efficiency and correctness. The streaming variants works by the caller defining read/write lambda functions as arguments. Let me know what you think and if you have any questions.

SymbolixAU · 2020-01-24T01:32:02Z

Thanks @bjornharrtell . I'll have a look over the next few days to see how we can make use of it.

SymbolixAU · 2020-04-06T21:28:35Z

@tim-salabim (@robertleeplummerjr) I saw you ping me (as my alter-ego) on this thread, and the conversation is directed towards typed javascript arrays for Deck.gl.

This is why I haven't progressed this (flatgeobufr) idea much, because I'm leaning towards the view that we can avoid the sf -> geojson conversion, and go direct to a columnar (array) format.

Hence my development of sfheaders::sf_to_df(). This gives us all the data in columns, and every coordinate has an attribute associated with it (if you use fill = TRUE).

Then these columns can be packaged up into a buffer / protobuf and sent over as arrays, and unpacked as arrays directly, without any geojson conversion.

We can certainly use the ideas behind flatgeobuf to do the packing into a buffer.

bjornharrtell · 2020-04-06T23:28:32Z

Note that flatbuffers (on which flatgeobuf is based on) doesn't need encoding / decoding, it's a aligned memory structure that can be memory mapped and accessed directly. Less compact than protobuf but much faster.

tim-salabim · 2020-04-07T07:11:38Z

@SymbolixAU @bjornharrtell @robertleeplummerjr this is all beyond my skills, hence the ping to more skilled folks like you.

FWIW, I love the idea of packing things up into columnar data (R leaflet does this via d3.data.frames I think) and sending it to the JS methods. GeoJSON is a very restricted format in my opinion and does have much redundant information. I mean, we can make sure that all features are of the same type before sending data, so there's no need to store the type for each feature...

@robertleeplummerjr is accepting (what I think are) flat arrays for point features in Leaflet.Glify already, so maybe we could leverage a similar approach for polygons and lines too?

tim-salabim · 2020-04-07T07:44:11Z

To be clear @SymbolixAU , you first convert sf -> df with sfheaders and then parse to JSON to send to JS?

bjornharrtell · 2020-04-07T09:48:54Z

Not sure if it's already a given, but you can fetch FlatGeobuf in the (modern) browser as binary and use as is without any conversion whatsoever if you can use the memory model directly (fx. access the coordinates directly as typed array). The only reason I convert to GeoJSON in the reference implementation is because it's easy to consume in various clients. I also have a conversion directly to OpenLayers which is more efficient but still a conversion to adapt to the expected memory model of an OpenLayers Feature.

One way to do it fully async and streaming is using the Fetch API and read the body as ReadableStream (will get the data as ArrayBuffer chunks).

SymbolixAU · 2020-04-07T21:43:25Z

you can fetch FlatGeobuf in the (modern) browser as binary and use as is without any conversion whatsoever if you can use the memory model directly (fx. access the coordinates directly as typed array)

yeah this is my plan.

The only reason I convert to GeoJSON in the reference implementation is because it's easy to consume in various clients

I suspected this is the case. Since Tim and I will have control of the R and JS code to both send and receive the data, we can be stricter about the format.

you first convert sf -> df with sfheaders and then parse to JSON to send to JS?

This is where I am going, yes. I've done this for POINT geometries, but the others still rely on sf -> geojson. And the "parse to JSON" step is only converting each column of a data.frame into an array.

kylebarron · 2020-04-08T02:03:26Z

And another aside, I'll ask the guys at deck.gl to consider supporting flatgeobuf directly too.

I created an issue in loaders.gl which is a project to load serialization formats into memory to be used with Deck.gl: visgl/loaders.gl#716.

As an aside; @SymbolixAU, there's discussion at https://github.com/geopandas/geo-arrow-spec on using Arrow as a cross-language geospatial in-memory data format. Since you're knowledgeable about the R geospatial ecosystem, your comments would be appreciated. Deck.gl also plans to support using Arrow tables internally: visgl/deck.gl#3758. Since there's a lot of cross-language community interest in Arrow, I'm looking to see if the community can standardize on a single specification.

SymbolixAU · 2020-04-08T03:38:18Z

Thanks @kylebarron - I've pinged a couple of others on twitter who will have an interest too (in the arrow spec).

Since there's a lot of cross-language community interest in Arrow, I'm looking to see if the community can standardize on a single specification.

a noble goal for sure.

SymbolixAU · 2020-04-23T04:22:24Z

Keeping this up to date, I'm adding an interleave capability to sfheaders which outputs the format

sfheaders:::rcpp_interleave_sf( mapdeck::roads[1:2, ] )
$coordinates
 [1] 145.01429 -37.83046 145.01434 -37.83057 145.01449 -37.83070 145.01599 -37.83148 145.01648 -37.83170 145.01681 -37.83175
[13] 145.01712 -37.83174 145.01750 -37.83167 145.01784 -37.83156 145.01835 -37.83138 145.01860 -37.83133 145.01890 -37.83130
[25] 145.01914 -37.83130 145.01943 -37.83133 145.01973 -37.83138 145.02020 -37.83146 145.02055 -37.83154 145.02064 -37.83159
[37] 145.02075 -37.83159 145.02099 -37.83166 145.01502 -37.83083 145.01556 -37.83113 145.01629 -37.83146 145.01637 -37.83150
[49] 145.01650 -37.83155 145.01659 -37.83157 145.01668 -37.83159 145.01675 -37.83160 145.01689 -37.83162 145.01696 -37.83162
[61] 145.01706 -37.83162 145.01715 -37.83162 145.01729 -37.83160 145.01739 -37.83158 145.01752 -37.83154 145.01817 -37.83132
[73] 145.01834 -37.83127 145.01848 -37.83125 145.01863 -37.83122 145.01881 -37.83121 145.01896 -37.83120 145.01914 -37.83121
[85] 145.01933 -37.83123 145.01951 -37.83126 145.02090 -37.83155 145.02096 -37.83157

$start_indices
[1]  0 20

$n_coordinates
[1] 20 26

$total_coordinates
[1] 46

$stride
[1] 2

$data
... omitted ...

I'm currently keeping this an internal function (so not exposing it for use by other packages yet) because I'm not sure it will stay in this package (as it's not strictly anything to do with sf), nor on the actual structure of the data yet.

This format also aligns in with the "nested list" suggestion on the geo-arrow thread.

I see this a more appropriate format for sending into a buffer, rather than going sf -> df -> buffer, or sf -> geojson -> buffer

tim-salabim · 2020-04-23T07:07:37Z

Makes sense to me. Seems more memory efficient than a df.

This format also aligns in with the "nested list" suggestion on the geo-arrow thread.

If I read your layout correctly, it's more like the 'flat list' layout the talk about in the linked issue, isn't it?

Are you still thinking about encoding these lists into a flatgeobuf or do you have a different buffer format in mind?

SymbolixAU · 2020-04-23T21:36:21Z

sorry yes, "flat list" is what I meant.

Are you still thinking about encoding these lists into a flatgeobuf or do you have a different buffer format in mind?

Yes this is still my intention. But, I'm following the geo-arrow, geoPandas and visgl loaders and deckgl arrow support conversations to see where they go too.

kylebarron · 2020-04-23T21:57:12Z

Deck.gl works with 3D coordinates often, and we need those coordinates to be in one flat array, so I doubt we'll adopt FlatGeobuf internally, but deserialization support is possible/likely sometime in the future.

SymbolixAU · 2020-04-23T22:03:03Z

oh I didn't realise flatgeobuf was constrained to XY only.

kylebarron · 2020-04-23T22:08:36Z

It's not, but it stores Z coordinates as a separate array
https://github.com/bjornharrtell/flatgeobuf/blob/85ef2682f635bafa8533ecc12afa63200135e7ea/src/fbs/feature.fbs#L7-L10

SymbolixAU · 2020-04-23T22:11:11Z

Right. I now understand. Thanks for bring me up to speed :)

tim-salabim · 2020-04-24T07:18:22Z

Yes this is still my intention. But, I'm following the geo-arrow, geoPandas and visgl loaders and deckgl arrow support conversations to see where they go too.

Wow, thanks for the link to the visgl/loaders repo. This looks very interesting and useful!

mdsumner · 2020-04-26T23:17:02Z

fwiw @SymbolixAU that interleave_sf above could act as a drop-in-replace for silicate::sc_path and silicate::sc_coord, and in another direction to mesh3d (with a stride of 4 and h = 1). I'm sure you're aware ... but it's awesome to me that the level of generality you are targetting will have so many uses. You've seen better than anyone the required set of low level tools.

dcooley · 2020-04-26T23:23:53Z

yeah I was thinking about silicate too and saw there might be some overlap, but was going to flesh out the design of {interleave} (which I will now make as its own package) first.

(edit: currently signed-in as me, dcooley, not other me symbolixau)

kylebarron mentioned this issue Apr 8, 2020

Tracker: FlatGeobuf Geospatial Loader visgl/loaders.gl#716

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the point? #1

What's the point? #1

SymbolixAU commented Jan 13, 2020

tim-salabim commented Jan 14, 2020

dcooley commented Jan 14, 2020

tim-salabim commented Jan 14, 2020

SymbolixAU commented Jan 14, 2020 •

edited

Loading

tim-salabim commented Jan 15, 2020 •

edited

Loading

tim-salabim commented Jan 15, 2020

bjornharrtell commented Jan 15, 2020

SymbolixAU commented Jan 15, 2020 •

edited

Loading

bjornharrtell commented Jan 15, 2020

bjornharrtell commented Jan 20, 2020 •

edited

Loading

SymbolixAU commented Jan 24, 2020

SymbolixAU commented Apr 6, 2020 •

edited

Loading

bjornharrtell commented Apr 6, 2020

tim-salabim commented Apr 7, 2020 •

edited

Loading

tim-salabim commented Apr 7, 2020

bjornharrtell commented Apr 7, 2020

SymbolixAU commented Apr 7, 2020

kylebarron commented Apr 8, 2020

SymbolixAU commented Apr 8, 2020

SymbolixAU commented Apr 23, 2020

tim-salabim commented Apr 23, 2020

SymbolixAU commented Apr 23, 2020 •

edited

Loading

kylebarron commented Apr 23, 2020

SymbolixAU commented Apr 23, 2020 •

edited

Loading

kylebarron commented Apr 23, 2020

SymbolixAU commented Apr 23, 2020

tim-salabim commented Apr 24, 2020

mdsumner commented Apr 26, 2020

dcooley commented Apr 26, 2020 •

edited

Loading

What's the point? #1

What's the point? #1

Comments

SymbolixAU commented Jan 13, 2020

tim-salabim commented Jan 14, 2020

dcooley commented Jan 14, 2020

tim-salabim commented Jan 14, 2020

SymbolixAU commented Jan 14, 2020 • edited Loading

tim-salabim commented Jan 15, 2020 • edited Loading

tim-salabim commented Jan 15, 2020

bjornharrtell commented Jan 15, 2020

SymbolixAU commented Jan 15, 2020 • edited Loading

bjornharrtell commented Jan 15, 2020

bjornharrtell commented Jan 20, 2020 • edited Loading

SymbolixAU commented Jan 24, 2020

SymbolixAU commented Apr 6, 2020 • edited Loading

bjornharrtell commented Apr 6, 2020

tim-salabim commented Apr 7, 2020 • edited Loading

tim-salabim commented Apr 7, 2020

bjornharrtell commented Apr 7, 2020

SymbolixAU commented Apr 7, 2020

kylebarron commented Apr 8, 2020

SymbolixAU commented Apr 8, 2020

SymbolixAU commented Apr 23, 2020

tim-salabim commented Apr 23, 2020

SymbolixAU commented Apr 23, 2020 • edited Loading

kylebarron commented Apr 23, 2020

SymbolixAU commented Apr 23, 2020 • edited Loading

kylebarron commented Apr 23, 2020

SymbolixAU commented Apr 23, 2020

tim-salabim commented Apr 24, 2020

mdsumner commented Apr 26, 2020

dcooley commented Apr 26, 2020 • edited Loading

SymbolixAU commented Jan 14, 2020 •

edited

Loading

tim-salabim commented Jan 15, 2020 •

edited

Loading

SymbolixAU commented Jan 15, 2020 •

edited

Loading

bjornharrtell commented Jan 20, 2020 •

edited

Loading

SymbolixAU commented Apr 6, 2020 •

edited

Loading

tim-salabim commented Apr 7, 2020 •

edited

Loading

SymbolixAU commented Apr 23, 2020 •

edited

Loading

SymbolixAU commented Apr 23, 2020 •

edited

Loading

dcooley commented Apr 26, 2020 •

edited

Loading