-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Ian Turton
committed
Jul 16, 2024
1 parent
1f1d381
commit 44767bf
Showing
3 changed files
with
206 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
--- | ||
layout: post | ||
title: Is GeoJSON a spatial data format? | ||
date: 2023-11-11 | ||
categories: gis | ||
--- | ||
# Is GeoJSON a good spatial data format? | ||
|
||
A few days ago on Mastodon [Eli Pousson](https://fosstodon.org/@[email protected]) | ||
asked: | ||
|
||
> Can anyone suggest examples of files that can contain location info but aren't often considered spatial data | ||
> file formats? | ||
> | ||
He suggested EXIF, [Iván Sánchez Ortega](@[email protected] ) | ||
followed up with spreadsheets, and being devilish I said GeoJSON. | ||
|
||
This led to more discussion, with people asking why I thought that, so I instead of being flippant I thought | ||
about it. This blog post is the result of those thoughts which I thought were kind of obvious but from things | ||
people have said since may be aren't that obvious. | ||
|
||
I've mostly been a developer for most of my career so my main interest in a spatial data format is that: | ||
|
||
1. it stores my spatial data as I want it to, | ||
2. it's fast to read and to a lesser extent, write. | ||
3. It's easy to manage. | ||
|
||
One, seems to be obvious, if I store a point then ask for it back I want to get that point back (to the limit | ||
of the precision of the processor's floating point). If a format can't manage that then please don't use it. | ||
This is not common but Excel comes to mind as a program that takes good data and trashes it. If it isn't | ||
changing [gene names into | ||
dates](https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates) then | ||
it's [reordering the dbf file to destroy your | ||
shapefile](https://gis.stackexchange.com/questions/132359/how-is-attribute-data-in-dbf-file-tied-to-shapefile-location-data-in-shp-file). | ||
GeoJSON also can fail at this as the standard says that I must store the data in WGS:84 (lon/lat), which is | ||
fine if that is the format that I store my data in already, but suppose I have some high quality OSGB data | ||
that is carefully surveyed to fractions of a millimetre and the underlying code does a conversion to WGS:84 in | ||
the background and further the developer wanted to save space and limited the number of decimal places to say | ||
6 (OK, [that was me](https://osgeo-org.atlassian.net/browse/GEOT-6650)) when it gets converted back to OSGB | ||
I'm looking at centimetres (or worse) but given the vagaries of floating point representation I may not be | ||
able to tell. | ||
|
||
Two, comes from being a GeoServer developer, a largish chunk of the time taken to draw a web map (or stream | ||
out a WFS file) is taken up by reading the data from the disk. Much of the rest of the time is converting the | ||
data into a form that we can draw. Ideally, we only want to read in the features needed for the map the user | ||
has requested (actually, ideally we want to **not** read in most of the data by having it already be in the | ||
cache, but that is hard to do). So we like indexed datasets both spatial indexes and attribute indexes can | ||
help substantially speed up map drawing. As the size of spatial datasets increases the time taken to fetch the | ||
next feature from the store becomes more and more important. An index allows the program to skip to the | ||
correct place in the file for either a specific feature or for features that are in a specific place or | ||
contain a certain attribute with the requested value. This is a great time saver, imagine trying to look | ||
something up in a big book by using the index compared to paging through it reading each page in turn. | ||
|
||
After one or more indexes the main thing I look for in a format is a binary format that is easy to read (and | ||
write). GeoJSON (and GML) are both problematic here as they are text formats (which is great in a transfer | ||
format) and so for every coordinate of every spatial object the computer has to read in a series of digits | ||
(and punctuation) and convert that into an actual binary number that it can understand. This is a slow | ||
operation (by computer speeds anyway) and if I have a couple of million points in my coastline file then I | ||
don't want to do 4 million slow operations before I even think of drawing something. | ||
|
||
Three, I have to interact with users on a fairly regular basis and in a lot of cases these are not spatial | ||
data experts. If a format comes with up to a dozen similarly named files (that are all important) that a GIS | ||
will refuse to process unless you guess which is the important one then it is more of a pain than a help. And | ||
yes shapefile I'm looking at you. If your process still makes use of Shapefiles please, please stop doing that | ||
to your users (and the support team) and switch over to GeoPackages which can store hundreds of data sets | ||
inside a single file, All good GIS products can process them by now, they have been an OGC standard for nearly | ||
10 years. If you don't think that shapefiles are confusing go and ask your support team how often they have | ||
been sent just the `.shp` file (or 11 files but not the `.sbn`) or how often they have seen people who have | ||
deleted all the none `.shp` files to save disk space. | ||
|
||
My other objection to GeoJSON is that I don't know what the structure (or schema) of the data set is until I | ||
have read the entire file. That last record could add several bonus attributes, in fact any (or all) of the | ||
records could do that, from a parsers view it is a nightmare. At least GML provides me with a fixed schema and | ||
enforces it through out the file. | ||
|
||
When I'm storing data (as opposed to transferring it) I use PostGIS, it's fast and accurate, can store my data | ||
in whatever projection I chose and is capable of interfacing with any GIS program I am likely to use, and if | ||
I'm writing new code then it provides good, well tested libraries in all the languages I care about so I don't | ||
have to get into the weeds of parsing binary formats. If I fetch a feature from PostGIS it will have exactly | ||
the attributes I was expecting no more or less. It has good indexes and a nifty DSL (SQL) that I can use to | ||
express my queries that get dealt with by a cool query optimiser that knows way more than I do about how to | ||
access data in the database. | ||
|
||
If for some reason I need to access my data while I'm travelling or share it with a colleague then I will use | ||
a GeoPackage which is a neat little database all packaged up in a single file. It's not a quick as PostGIS so | ||
I wouldn't use it for millions of records but for most day to day GIS data sets it's great. You can even store | ||
you QGIS styles and project in it to make it a single file project transfer format. | ||
|
||
One final point, I sometimes see people preaching that we should go cloud native (and often serverless) by | ||
embracing "modern" standards like GeoJSON and COGs. GeoJSON should never be used as a cloud native storage | ||
option (unless it's so small you can read it once and cache it in memory in which case why are you using the | ||
cloud) as it is large (yes, I know it compresses well) and slow to parse (and slower still if you compressed | ||
it first) and can't be indexed. So that means you have to copy the whole file from a disk on the far side of a | ||
slow internet connection. I don't care if you have fibre to the door it is still slow compared to the disk in | ||
your machine! | ||
|
||
![The Jack Sparrow worst pirate meme but for GeoJSON](/images/geojson.jpg ) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
--- | ||
layout: post | ||
title: Adding a spell check to QGIS | ||
date: 2024-07-16 | ||
categories: foss | ||
--- | ||
|
||
# Adding a Spell Check to QGIS | ||
|
||
(Or what to do on a rainy bank holiday in Glasgow) | ||
|
||
This Monday was a local bank holiday in Glasgow (or at least the university) as a remnant of when the whole | ||
town took a train to Blackpool in the same two weeks so that the ship builders and steel works could stop in a | ||
coordinated fashion. As is required in the UK the weather was awful so I stayed in and being bored looked at | ||
my long list of possible projects. I picked one that has been kicking around on the list for a while adding a | ||
spell checker for QGIS. As a dyslexic I have spell checking turned on in nearly every program I enter text | ||
into including `vim`, `InteliJ` and my browser. So I have always felt that what QGIS really needed was a way | ||
to spell check maps before I printed them at A3 and put them on the wall. | ||
|
||
Back in 2019 North Road wrote a i[blog post about custom layout checks | ||
](https://north-road.com/2019/01/14/on-custom-layout-checks-in-qgis-3-6-and-how-they-can-do-your-work-for-you/) | ||
and ended it with a throw away comment "It’d even be possible to hook into one of the available Python spell | ||
checking libraries to write a spelling check!". I came across this when I was trying to see if there was an | ||
easy way for my students (many of whom have English as a second language) to avoid handing in projects with | ||
glaring (i.e. I can see them) spelling errors in the title. So I stuck the link on my backlog, until the | ||
proverbial rainy day came along. | ||
|
||
## Implementation | ||
|
||
Obviously I'm the last person who should be allowed to write spell checking software, but the joy of open | ||
source is that for things like this someone else has almost certainly already done it. So a quick duck-duck-go | ||
found me installing `pyspellcheck` which seemed like it would do what I want. It has a pretty easy interface | ||
in that once you've created a spell checker object, you can just pass in a list of words and it will return a | ||
list of (probably) misspelled words and a method to give the most likely correction and another method to give | ||
you list of other possibilities. Armed with this I could create a method to find and check all the text | ||
elements of a print layout. | ||
|
||
```py | ||
@check.register(type=QgsAbstractValidityCheck.TypeLayoutCheck) | ||
def layout_check_spelling(context, feedback): | ||
layout = context.layout | ||
results = [] | ||
checker = SpellChecker() | ||
|
||
for i in layout.items(): | ||
if isinstance(i, QgsLayoutItemLabel): | ||
text = i.currentText() | ||
tokens = [word.strip(string.punctuation) for word in text.split()] | ||
misspelled = checker.unknown(tokens) | ||
for word in misspelled: | ||
res = QgsValidityCheckResult() | ||
res.type = QgsValidityCheckResult.Warning | ||
res.title = 'Spelling Error?' | ||
template = f""" | ||
<strong>'{word}</strong>' may be misspelled, would | ||
'<strong>{checker.correction(word)}</strong>' be a better choice? | ||
""" | ||
possibles = checker.candidates(word) | ||
if len(possibles) > 1: | ||
template += """ | ||
Or one of:<br/> | ||
<ul> | ||
""" | ||
for t in possibles: | ||
template += f"<li>{t}</li>\n" | ||
template += '</ul>' | ||
res.detailedDescription = template | ||
results.append(res) | ||
return results | ||
``` | ||
|
||
And in theory, that was that! But I'm pretty sure that my students (and everyone else) probably didn't want to | ||
cut and paste that into the console every time they wanted to spell check a map. So, I looked at how to | ||
package this up for QGIS. I built a plugin (using the plugin builder tool), but then things got a little | ||
tricky as I can't see any way for a plugin to add itself to the print layout rather than the main QGIS window | ||
(please let me know if it is possible), and it seemed unintuitive to make people press a button in one window | ||
to effect another one, besides the whole point of being a `QgsAbstractValidityCheck` was that the method is | ||
automatically run on print. So I didn't need most of the plugin code or did I? On further thought I did, there | ||
is a need for some GUI as the user can pick which language they want to use in the spell check. `pyspellcheck` | ||
can spell check English, Spanish, French, Portuguese, German, Italian, Russian, Arabic, Basque, Latvian and | ||
Dutch (so if those are your language then please test this for me). I also thought that providing the option | ||
to supply a different to the default personal dictionary might be useful. So that made use of the dialog that | ||
pops up when you hit the plugin. | ||
|
||
But it turns out you can't register a class method as as a `QgsAbstractValidityCheck` since it gets confused | ||
when QGIS calls it later. So I had to move my checker method outside the plugin class. But then I couldn't | ||
access the language and dictionary that was set in the GUI! Some more searching gave me the following code: | ||
|
||
```py | ||
_instance = plugins['qgis-spellcheck'] | ||
checker = _instance.checker | ||
``` | ||
|
||
Whereby I can pull out the named plugin and grab it's spell checker, which was created in the plugin's | ||
`__init__` method. I seem to have a small issue that the user's profile is not set when that runs which messes | ||
up where the personal dictionary is put (again if you know how to fix this let me know). | ||
|
||
|
||
## Future Work | ||
|
||
Ideally, I'd like the spell checker to scan and highlight the text in the boxes as I typed but I fear that is | ||
beyond my understanding of the QGIS/Qt interface. Next highest on my wish list is for the list of spelling | ||
issues to be non-modal so I can cut and paste fixes into the text box, rather than having to memorise the | ||
correct spelling, close the window and then type it in (again answers on a github issue). | ||
|
||
I'm sure all sorts of things will come up once people start using it, so as usual issues and PRs are welcome | ||
at https://github.com/ianturton/qgis-spellcheck. | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.