-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What's the point? #1
Comments
As I see it, a performant stand-alone (i.e. GDAL independent) driver for reading and especially writing flatgeobuf files from R makes sense because:
The V8 based approach gets real slow real quick when writing large data. So my hope is that via C++ we can speed things up drastically. |
You make a convincing argument. In terms of the design, these are what I consider the core functionality
I've ordered these by what I think would be the priority to implement, with 3 & 4 possibly redundant, but may be nice. |
That sounds like a good plan. What about geometry only data, i.e. |
yes I image we'd support I think the first step is to understand how Then we need to pass the result geobuffer from R to javascript - have you done this in your leaflet demo, or did you read the buffer directly into the javascript library? |
I read it directly. The way this is currently implemented is:
So as it is currently implemented. I would write the whole file once and then stream it in chunks onto the map. I am not sure if it is possible to stream directly from R to JS... |
Maybe @bjornharrtell can provide some input as to how serialize works and what would be the best path? |
Indeed serialize is implemented as a reference/example and using mapbox types for feature, geometry and geojson representations because that's what I found most available/accessible as generic implementations of such things in the C++ world. It is also not a streaming API like I have in the reference JavaScript implementation. I'm not good enough with C++ to determine what a good streaming API and implementation should look like (see flatgeobuf/flatgeobuf#24). Serialize simply takes a mapbox feature collection (a "table" of geometry and attribute data) and serializes into a flatgeobuf binary. This process involves writing a header, index and the features. The index should be optional but is not implemented yet (but is in the GDAL driver). The index makes it possible to do fast filtering on individual feature or spatial bounding box without reading the whole dataset but requires more time to write and cannot be serialized as a stream because it's a two pass process. The use case for having no index is when the the whole dataset is to be consumed at the client side without any filtering and can then be serialized as a stream. It sounds like you really want to serialize from something else (I don't know what these sfg, sfc and sf are) and considering why you are doing this in the first place I would agree it's the way to go and might not even be more difficult to adapt the code to your types than trying to map your stuff to the mapbox types. If your types are general purpose in the R community and easily imported into my repository I could definitely consider adopting support for them in the reference C++ impl. |
Yes.
Given everything in the R world is also in the C++ world, the mapping is straight forward - done here (but I have a suspicion it can be improved by re-assigning the container type, rather than re-populating a new container) From my first reading of From then on comes the magic, which I need to read through and understand what it's doing to construct the And for some context
And another aside, I'll ask the guys at deck.gl to consider supporting flatgeobuf directly too. |
Reading
So, unfortunately I would say the C++ ref impl. is in pretty bad shape right now. I hope to be able to improve it in a not too far future. |
With flatgeobuf/flatgeobuf#27 merged the flatgeobuf C++ ref impl. is now hopefully in a better shape. Naive non-streaming serialize and deserialize are still available and with improved efficiency and correctness. The streaming variants works by the caller defining read/write lambda functions as arguments. Let me know what you think and if you have any questions. |
Thanks @bjornharrtell . I'll have a look over the next few days to see how we can make use of it. |
@tim-salabim (@robertleeplummerjr) I saw you ping me (as my alter-ego) on this thread, and the conversation is directed towards typed javascript arrays for Deck.gl. This is why I haven't progressed this (flatgeobufr) idea much, because I'm leaning towards the view that we can avoid the sf -> geojson conversion, and go direct to a columnar (array) format. Hence my development of Then these columns can be packaged up into a buffer / protobuf and sent over as arrays, and unpacked as arrays directly, without any geojson conversion. We can certainly use the ideas behind flatgeobuf to do the packing into a buffer. |
Note that flatbuffers (on which flatgeobuf is based on) doesn't need encoding / decoding, it's a aligned memory structure that can be memory mapped and accessed directly. Less compact than protobuf but much faster. |
@SymbolixAU @bjornharrtell @robertleeplummerjr this is all beyond my skills, hence the ping to more skilled folks like you. FWIW, I love the idea of packing things up into columnar data (R leaflet does this via d3.data.frames I think) and sending it to the JS methods. GeoJSON is a very restricted format in my opinion and does have much redundant information. I mean, we can make sure that all features are of the same type before sending data, so there's no need to store the type for each feature... @robertleeplummerjr is accepting (what I think are) flat arrays for point features in Leaflet.Glify already, so maybe we could leverage a similar approach for polygons and lines too? |
To be clear @SymbolixAU , you first convert sf -> df with sfheaders and then parse to JSON to send to JS? |
Not sure if it's already a given, but you can fetch FlatGeobuf in the (modern) browser as binary and use as is without any conversion whatsoever if you can use the memory model directly (fx. access the coordinates directly as typed array). The only reason I convert to GeoJSON in the reference implementation is because it's easy to consume in various clients. I also have a conversion directly to OpenLayers which is more efficient but still a conversion to adapt to the expected memory model of an OpenLayers Feature. One way to do it fully async and streaming is using the Fetch API and read the body as ReadableStream (will get the data as ArrayBuffer chunks). |
yeah this is my plan.
I suspected this is the case. Since Tim and I will have control of the R and JS code to both send and receive the data, we can be stricter about the format.
This is where I am going, yes. I've done this for POINT geometries, but the others still rely on sf -> geojson. And the "parse to JSON" step is only converting each column of a data.frame into an array. |
I created an issue in As an aside; @SymbolixAU, there's discussion at https://github.com/geopandas/geo-arrow-spec on using Arrow as a cross-language geospatial in-memory data format. Since you're knowledgeable about the R geospatial ecosystem, your comments would be appreciated. Deck.gl also plans to support using Arrow tables internally: visgl/deck.gl#3758. Since there's a lot of cross-language community interest in Arrow, I'm looking to see if the community can standardize on a single specification. |
Thanks @kylebarron - I've pinged a couple of others on twitter who will have an interest too (in the arrow spec).
a noble goal for sure. |
Keeping this up to date, I'm adding an sfheaders:::rcpp_interleave_sf( mapdeck::roads[1:2, ] )
$coordinates
[1] 145.01429 -37.83046 145.01434 -37.83057 145.01449 -37.83070 145.01599 -37.83148 145.01648 -37.83170 145.01681 -37.83175
[13] 145.01712 -37.83174 145.01750 -37.83167 145.01784 -37.83156 145.01835 -37.83138 145.01860 -37.83133 145.01890 -37.83130
[25] 145.01914 -37.83130 145.01943 -37.83133 145.01973 -37.83138 145.02020 -37.83146 145.02055 -37.83154 145.02064 -37.83159
[37] 145.02075 -37.83159 145.02099 -37.83166 145.01502 -37.83083 145.01556 -37.83113 145.01629 -37.83146 145.01637 -37.83150
[49] 145.01650 -37.83155 145.01659 -37.83157 145.01668 -37.83159 145.01675 -37.83160 145.01689 -37.83162 145.01696 -37.83162
[61] 145.01706 -37.83162 145.01715 -37.83162 145.01729 -37.83160 145.01739 -37.83158 145.01752 -37.83154 145.01817 -37.83132
[73] 145.01834 -37.83127 145.01848 -37.83125 145.01863 -37.83122 145.01881 -37.83121 145.01896 -37.83120 145.01914 -37.83121
[85] 145.01933 -37.83123 145.01951 -37.83126 145.02090 -37.83155 145.02096 -37.83157
$start_indices
[1] 0 20
$n_coordinates
[1] 20 26
$total_coordinates
[1] 46
$stride
[1] 2
$data
... omitted ... I'm currently keeping this an internal function (so not exposing it for use by other packages yet) because I'm not sure it will stay in this package (as it's not strictly anything to do with This format also aligns in with the "nested list" suggestion on the geo-arrow thread. I see this a more appropriate format for sending into a buffer, rather than going |
Makes sense to me. Seems more memory efficient than a
If I read your layout correctly, it's more like the 'flat list' layout the talk about in the linked issue, isn't it? Are you still thinking about encoding these lists into a flatgeobuf or do you have a different buffer format in mind? |
sorry yes, "flat list" is what I meant.
Yes this is still my intention. But, I'm following the geo-arrow, geoPandas and visgl loaders and deckgl arrow support conversations to see where they go too. |
Deck.gl works with 3D coordinates often, and we need those coordinates to be in one flat array, so I doubt we'll adopt FlatGeobuf internally, but deserialization support is possible/likely sometime in the future. |
oh I didn't realise flatgeobuf was constrained to XY only. |
It's not, but it stores Z coordinates as a separate array |
Right. I now understand. Thanks for bring me up to speed :) |
Wow, thanks for the link to the visgl/loaders repo. This looks very interesting and useful! |
fwiw @SymbolixAU that |
yeah I was thinking about silicate too and saw there might be some overlap, but was going to flesh out the design of {interleave} (which I will now make as its own package) first. (edit: currently signed-in as me, dcooley, not other me symbolixau) |
Why does this library exist, what should it do?
The text was updated successfully, but these errors were encountered: