Skip to content

oeklo/geojson-msgspec

Repository files navigation

This project contains code published as an example in the msgspec project <https://jcristharif.com/msgspec/> backported to python 3.8.

GeoJSON

GeoJSON is a popular format for encoding geographic data. Its specification describes nine different types a message may take (seven "geometry" types, plus two "feature" types). Here we provide one way of implementing that specification using msgspec to handle the parsing and validation.

The loads and dumps methods defined below work similar to the standard library's json.loads/json.dumps, but:

  • Will result in high-level msgspec.Struct objects representing GeoJSON types
  • Will error nicely if a field is missing or the wrong type
  • Will fill in default values for optional fields
  • Decodes and encodes significantly faster than the json module (as well as most other json implementations in Python).

This example makes use msgspec.Struct types to define the different GeoJSON types, and :ref:`struct-tagged-unions` to differentiate between them. See the relevant docs for more information.

The full example source can be found here.

.. literalinclude:: ../../../examples/geojson/msgspec_geojson.py
    :language: python


Here we use the loads method defined above to read some example GeoJSON.

In [1]: import msgspec_geojson

In [2]: with open("canada.json", "rb") as f:
   ...:     data = f.read()

In [3]: canada = msgspec_geojson.loads(data)

In [4]: type(canada)  # loaded as high-level, validated object
Out[4]: msgspec_geojson.FeatureCollection

In [5]: canada.features[0].properties
Out[5]: {'name': 'Canada'}

Comparing performance to:

In [6]: %timeit msgspec_geojson.loads(data)  # benchmark msgspec
6.15 ms ± 13.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [7]: %timeit orjson.loads(data)  # benchmark orjson
8.67 ms ± 20.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [8]: %timeit json.loads(data)  # benchmark json
27.6 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [9]: %timeit geojson.loads(data)  # benchmark geojson
93.9 ms ± 88.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

This shows that the readable msgspec implementation above is 1.4x faster than orjson (on this data), while also ensuring the loaded data is valid GeoJSON. Compared to geojson (another validating geojson library for python), loading the data using msgspec was 15.3x faster.

Releases

No releases published

Packages

No packages published

Languages