Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracker: FlatGeobuf Geospatial Loader #716

Closed
2 of 8 tasks
kylebarron opened this issue Apr 8, 2020 · 11 comments
Closed
2 of 8 tasks

Tracker: FlatGeobuf Geospatial Loader #716

kylebarron opened this issue Apr 8, 2020 · 11 comments

Comments

@kylebarron
Copy link
Collaborator

kylebarron commented Apr 8, 2020


FlatGeobuf is

A performant binary encoding for geographic data based on flatbuffers that can hold a collection of Simple Features including circular interpolations as defined by SQL-MM Part 3.

It's a relatively new geospatial serialization format, but is exciting for a few reasons.

  • Fast. According to their benchmarks, it's considerably faster than both Shapefile and GeoJSON reading (tests done with the GDAL driver)
  • Streaming support. In this Observable notebook you can see the progressive data load
  • Filtered reads over HTTP. An R-tree is included in the serialized file. So you can pass a bounding box and use HTTP range requests to load only the geometries intersecting your query
  • Direct flat array access. I believe that coordinates are saved as a flat, interweaved array in the flatbuffer, so it should be fast to parse directly into a flat typed array.
  • Cross language support. So far there's a GDAL driver, plus reference TypeScript, C++, Java, and Rust implementations.

Cons:

  • Compressed files can't have filtered reads

Since this is a relatively new data format, it isn't the most widely used, but for users who have control over the backend data format and want highest performance, it could be ideal.

@kylebarron
Copy link
Collaborator Author

SymbolixAU/flatgeobufr#1 (comment)

you can fetch FlatGeobuf in the (modern) browser as binary and use as is without any conversion whatsoever if you can use the memory model directly (fx. access the coordinates directly as typed array).

@ibgreen
Copy link
Collaborator

ibgreen commented Apr 8, 2020

Some time ago I did a test implementation of a flatgeobuf loader.

The demo is awesome and the source repo is impressively ambitious and supports multiple language binding but I also found it to be something of a work-in-progress.

At least the javascript bindings appear to still be rough around the edges: The published npm module was not properly set up to be usable, and I tried to fork the code however it was complex (perhaps a bit "over architected"?) and also was fully written in typescript, which complicated building it inside loaders.gl.

At the time I decided that focusing on converting geojson to arrow (which also supports streaming loads) would be more generally useful. That said, I believe that flatgeobuf is a more compact representation., and I do like the initiative.

I can push my branch if you'd like to take a second look.

@ibgreen
Copy link
Collaborator

ibgreen commented Apr 8, 2020

you can fetch FlatGeobuf in the (modern) browser as binary and use as is without any conversion whatsoever if you can use the memory model directly (fx. access the coordinates directly as typed array).

Is this true? I guess it could depend on what one needs, but I was assuming the flatgeobuf achieved its compactness by "interleaving" data - Are all the coordinates available in an array that could be accessible from the GPU?

@kylebarron
Copy link
Collaborator Author

At least the javascript bindings appear to still be rough around the edges: The published npm module was not properly set up to be usable

Yes, I opened an issue about this (flatgeobuf/flatgeobuf#52). The package.json is missing a main field, so require('flatgeobuf') fails.

He suggested importing the untranspiled source, e.g.

require('flatgeobuf/lib/generic/featurecollection.js')

I haven't researched it much, but I think it would be possible to access everything we'd need through those imports.

At the time I decided that focusing on converting geojson to arrow (which also supports streaming loads) would be more generally useful. That said, I believe that flatgeobuf is a more compact representation., and I do like the initiative.

I think you might be right, so I wouldn't necessarily say this is a top priority.

Is this true? I guess it could depend on what one needs, but I was assuming the flatgeobuf achieved its compactness by "interleaving" data - Are all the coordinates available in an array that could be accessible from the GPU?

Are you referring to interleaving features or coordinates? At the coordinate level, we've been working with interleaved data, right? I.e.

[x0, y0, z0, x1, y1, z1, ...]

I don't believe features are interleaved. I believe that each feature is its own flatbuffer, and so you should be able to get an array like

[x0, y0, x1, y1, ...]

without copy from the original bytes. (And it looks possible to get line/polygon ring offsets in a similar fashion, though a copy might be involved there.)

The format does, however, store z/m arrays separately. So if the source data exist in 3 dimensions, you'd need to load the xy and the z arrays separately and intersperse the coordinates into a flat array.

@ibgreen
Copy link
Collaborator

ibgreen commented Apr 8, 2020

He suggested importing the untranspiled source, e.g.

I am pretty sure I tried that and ran into other issues, but don't recall exactly (I think not everything I needed was in that dist).

@ibgreen
Copy link
Collaborator

ibgreen commented Apr 9, 2020

Regarding #718 That is very neat micro integration!

I'm open for discussion. My initial take is that the primary justifications for including a flatgeobuf loader in loaders.gl are:

  1. the incremental loading. If we include this, we should be able to replicate the observable in a deck.gl based example on the loaders.gl web page (showing the geojson incrementally render as it loads).
  2. This should be done by following the loaders.gl parseInBatches model (which is based on AsyncIterators).

I seem to recall that as I worked through the code, it looked like supporting the AsyncIterator model would take a bit of work.

Also given that is is essentially a new experimental format, there is not likely to be much flatgeobuf data available, so a meaningful release should probably also offer a writer for the format.

@kylebarron
Copy link
Collaborator Author

Yes, I agree the incremental loading is essential. I know you mentioned differences in "push/pull" here. I first need to read more about AsyncIterators, before attempting to add support for it. It'll also be helpful to read either your previous PR or look at other loaders that support incremental loading.

@kylebarron
Copy link
Collaborator Author

From a few minutes of reading, I think AsyncIterators are supported by default in flatgeobuf... deserializeStream returns an iterator, and each iteration is a promise. (From here)

let it = flatgeobuf.deserializeStream(response.body)
it.next().then(handleResult)

@ibgreen
Copy link
Collaborator

ibgreen commented Apr 9, 2020

Yes

  • unfortunately deserializeStream takes a stream as input.
  • parseInBatches takes an asyncIterator as input.

We need a version of deserializeStream() that accepts an async iterator (an async iterator that yields ArrayBuffer chunks).

One approach is to create a Stream wrapper over an async iterator (AsyncIteratorStream) and pass that in.

Although it does suffer from a slight "impedance mismatch", it should work and also such an adapter could be reused for quick and dirty integrations of other stream based parsers.

We already have the opposite helper function, create async iterator from stream - this is now actually built-in to Node streams).

@kylebarron
Copy link
Collaborator Author

Ah I thought you were referring to output, not input.

One approach is to create a Stream wrapper over an async iterator (AsyncIteratorStream) and pass that in.

I think that's worth some exploration. If it's small and reusable it could potentially enable new loaders, both this and Shapefile.

@ibgreen ibgreen changed the title FlatGeobuf Geospatial Loader Tracker: FlatGeobuf Geospatial Loader Jul 31, 2020
@ibgreen ibgreen mentioned this issue Oct 24, 2023
70 tasks
@ibgreen
Copy link
Collaborator

ibgreen commented Oct 24, 2023

Linking to this tracker issue in future release trackers, but closing for inactivity.

@ibgreen ibgreen closed this as completed Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants