repost

ianturton · Jul 16, 2024 · 44767bf · 44767bf
1 parent 1f1d381
commit 44767bf
Show file tree

Hide file tree

Showing 3 changed files with 206 additions and 0 deletions.
diff --git a/_posts/2023-11-11-geojson.md b/_posts/2023-11-11-geojson.md
@@ -0,0 +1,98 @@
+---
+layout: post
+title: Is GeoJSON a spatial data format?
+date: 2023-11-11
+categories: gis
+---
+# Is GeoJSON a good spatial data format?
+
+A few days ago on Mastodon [Eli Pousson](https://fosstodon.org/@[email protected])
+asked:
+
+> Can anyone suggest examples of files that can contain location info but aren't often considered spatial data 
+> file formats?
+>
+
+He suggested EXIF, [Iván Sánchez Ortega](@[email protected] )
+followed up with spreadsheets, and being devilish I said GeoJSON.
+
+This led to more discussion, with people asking why I thought that, so I instead of being flippant I thought 
+about it. This blog post is the result of those thoughts which I thought were kind of obvious but from things 
+people have said since may be aren't that obvious.
+
+I've mostly been a developer for most of my career so my main interest in a spatial data format is that:
+
+1. it stores my spatial data as I want it to,
+2. it's fast to read and to a lesser extent, write.
+3. It's easy to manage.
+
+One, seems to be obvious, if I store a point then ask for it back I want to get that point back (to the limit 
+of the precision of the processor's floating point). If a format can't manage that then please don't use it. 
+This is not common but Excel comes to mind as a program that takes good data and trashes it. If it isn't 
+changing [gene names into 
+dates](https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates) then 
+it's [reordering the dbf file to destroy your 
+shapefile](https://gis.stackexchange.com/questions/132359/how-is-attribute-data-in-dbf-file-tied-to-shapefile-location-data-in-shp-file). 
+GeoJSON also can fail at this as the standard says that I must store the data in WGS:84 (lon/lat), which is 
+fine if that is the format that I store my data in already, but suppose I have some high quality OSGB data 
+that is carefully surveyed to fractions of a millimetre and the underlying code does a conversion to WGS:84 in 
+the background and further the developer wanted to save space and limited the number of decimal places to say 
+6 (OK, [that was me](https://osgeo-org.atlassian.net/browse/GEOT-6650)) when it gets converted back to OSGB 
+I'm looking at centimetres (or worse) but given the vagaries of floating point representation I may not be 
+able to tell. 
+
+Two, comes from being a GeoServer developer, a largish chunk of the time taken to draw a web map (or stream 
+out a WFS file) is taken up by reading the data from the disk. Much of the rest of the time is converting the 
+data into a form that we can draw. Ideally, we only want to read in the features needed for the map the user 
+has requested (actually, ideally we want to **not** read in most of the data by having it already be in the 
+cache, but that is hard to do). So we like indexed datasets both spatial indexes and attribute indexes can 
+help substantially speed up map drawing. As the size of spatial datasets increases the time taken to fetch the 
+next feature from the store becomes more and more important. An index allows the program to skip to the 
+correct place in the file for either a specific feature or for features that are in a specific place or 
+contain a certain attribute with the requested value. This is a great time saver, imagine trying to look 
+something up in a big book by using the index compared to paging through it reading each page in turn.
+
+After one or more indexes the main thing I look for in a format is a binary format that is easy to read (and 
+write). GeoJSON (and GML) are both problematic here as they are text formats (which is great in a transfer 
+format) and so for every coordinate of every spatial object the computer has to read in a series of digits 
+(and punctuation) and convert that into an actual binary number that it can understand. This is a slow 
+operation (by computer speeds anyway) and if I have a couple of million points in my coastline file then I 
+don't want to do 4 million slow operations before I even think of drawing something. 
+
+Three, I have to interact with users on a fairly regular basis and in a lot of cases these are not spatial 
+data experts. If a format comes with up to a dozen similarly named files (that are all important) that a GIS 
+will refuse to process unless you guess which is the important one then it is more of a pain than a help. And 
+yes shapefile I'm looking at you. If your process still makes use of Shapefiles please, please stop doing that 
+to your users (and the support team) and switch over to GeoPackages which can store hundreds of data sets 
+inside a single file, All good GIS products can process them by now, they have been an OGC standard for nearly 
+10 years. If you don't think that shapefiles are confusing go and ask your support team how often they have 
+been sent just the `.shp` file (or 11 files but not the `.sbn`) or how often they have seen people who have 
+deleted all the none `.shp` files to save disk space. 
+
+My other objection to GeoJSON is that I don't know what the structure (or schema) of the data set is until I 
+have read the entire file. That last record could add several bonus attributes, in fact any (or all) of the 
+records could do that, from a parsers view it is a nightmare. At least GML provides me with a fixed schema and 
+enforces it through out the file.
+
+When I'm storing data (as opposed to transferring it) I use PostGIS, it's fast and accurate, can store my data 
+in whatever projection I chose and is capable of interfacing with any GIS program I am likely to use, and if 
+I'm writing new code then it provides good, well tested libraries in all the languages I care about so I don't 
+have to get into the weeds of parsing binary formats. If I fetch a feature from PostGIS it will have exactly 
+the attributes I was expecting no more or less. It has good indexes and a nifty DSL (SQL) that I can use to 
+express my queries that get dealt with by a cool query optimiser that knows way more than I do about how to 
+access data in the database. 
+
+If for some reason I need to access my data while I'm travelling or share it with a colleague then I will use 
+a GeoPackage which is a neat little database all packaged up in a single file. It's not a quick as PostGIS so 
+I wouldn't use it for millions of records but for most day to day GIS data sets it's great. You can even store 
+you QGIS styles and project in it to make it a single file project transfer format. 
+
+One final point, I sometimes see people preaching that we should go cloud native (and often serverless) by 
+embracing "modern" standards like GeoJSON and COGs. GeoJSON should never be used as a cloud native storage 
+option (unless it's so small you can read it once and cache it in memory in which case why are you using the 
+cloud) as it is large (yes, I know it compresses well) and slow to parse (and slower still if you compressed 
+it first) and can't be indexed. So that means you have to copy the whole file from a disk on the far side of a 
+slow internet connection. I don't care if you have fibre to the door it is still slow compared to the disk in 
+your machine! 
+
+![The Jack Sparrow worst pirate meme but for GeoJSON](/images/geojson.jpg )
diff --git a/_posts/2024-07-16-spelling.md b/_posts/2024-07-16-spelling.md
@@ -0,0 +1,108 @@
+---
+layout: post
+title: Adding a spell check to QGIS
+date: 2024-07-16
+categories: foss
+---
+
+# Adding a Spell Check to QGIS
+
+(Or what to do on a rainy bank holiday in Glasgow)
+
+This Monday was a local bank holiday in Glasgow (or at least the university) as a remnant of when the whole 
+town took a train to Blackpool in the same two weeks so that the ship builders and steel works could stop in a 
+coordinated fashion. As is required in the UK the weather was awful so I stayed in and being bored looked at 
+my long list of possible projects. I picked one that has been kicking around on the list for a while adding a 
+spell checker for QGIS. As a dyslexic I have spell checking turned on in nearly every program I enter text 
+into including `vim`, `InteliJ` and my browser. So I have always felt that what QGIS really needed was a way 
+to spell check maps before I printed them at A3 and put them on the wall. 
+
+Back in 2019 North Road wrote a i[blog post about custom layout checks 
+](https://north-road.com/2019/01/14/on-custom-layout-checks-in-qgis-3-6-and-how-they-can-do-your-work-for-you/) 
+and ended it with a throw away comment "It’d even be possible to hook into one of the available Python spell 
+checking libraries to write a spelling check!". I came across this when I was trying to see if there was an 
+easy way for my students (many of whom have English as a second language) to avoid handing in projects with 
+glaring (i.e. I can see them) spelling errors in the title. So I stuck the link on my backlog, until the 
+proverbial rainy day came along.
+
+## Implementation
+
+Obviously I'm the last person who should be allowed to write spell checking software, but the joy of open 
+source is that for things like this someone else has almost certainly already done it. So a quick duck-duck-go 
+found me installing `pyspellcheck` which seemed like it would do what I want. It has a pretty easy interface 
+in that once you've created a spell checker object, you can just pass in a list of words and it will return a 
+list of (probably) misspelled words and a method to give the most likely correction and another method to give 
+you list of other possibilities. Armed with this I could create a method to find and check all the text 
+elements of a print layout.
+
+```py
+@check.register(type=QgsAbstractValidityCheck.TypeLayoutCheck)
+def layout_check_spelling(context, feedback):
+    layout = context.layout
+    results = []
+    checker = SpellChecker()
+
+    for i in layout.items():
+        if isinstance(i, QgsLayoutItemLabel):
+            text = i.currentText()
+            tokens = [word.strip(string.punctuation) for word in text.split()]
+            misspelled = checker.unknown(tokens)
+            for word in misspelled:
+                res = QgsValidityCheckResult()
+                res.type = QgsValidityCheckResult.Warning
+                res.title = 'Spelling Error?'
+                template = f"""
+                <strong>'{word}</strong>' may be misspelled, would
+                '<strong>{checker.correction(word)}</strong>' be a better choice?
+                """
+                possibles = checker.candidates(word)
+                if len(possibles) > 1:
+                    template += """
+                    Or one of:<br/>
+                    <ul>
+                    """
+                    for t in possibles:
+                        template += f"<li>{t}</li>\n"
+                    template += '</ul>'
+                res.detailedDescription = template
+                results.append(res)
+    return results
+```
+
+And in theory, that was that! But I'm pretty sure that my students (and everyone else) probably didn't want to 
+cut and paste that into the console every time they wanted to spell check a map. So, I looked at how to 
+package this up for QGIS. I built a plugin (using the plugin builder tool), but then things got a little 
+tricky as I can't see any way for a plugin to add itself to the print layout rather than the main QGIS window 
+(please let me know if it is possible), and it seemed unintuitive to make people press a button in one window 
+to effect another one, besides the whole point of being a `QgsAbstractValidityCheck` was that the method is 
+automatically run on print. So I didn't need most of the plugin code or did I? On further thought I did, there 
+is a need for some GUI as the user can pick which language they want to use in the spell check. `pyspellcheck` 
+can spell check English, Spanish, French, Portuguese, German, Italian, Russian, Arabic, Basque, Latvian and 
+Dutch (so if those are your language then please test this for me). I also thought that providing the option 
+to supply a different to the default personal dictionary might be useful. So that made use of the dialog that 
+pops up when you hit the plugin. 
+
+But it turns out you can't register a class method as as a `QgsAbstractValidityCheck` since it gets confused 
+when QGIS calls it later. So I had to move my checker method outside the plugin class. But then I couldn't 
+access the language and dictionary that was set in the GUI! Some more searching gave me the following code:
+
+```py
+  _instance = plugins['qgis-spellcheck']
+  checker = _instance.checker
+```
+
+Whereby I can pull out the named plugin and grab it's spell checker, which was created in the plugin's 
+`__init__` method. I seem to have a small issue that the user's profile is not set when that runs which messes 
+up where the personal dictionary is put (again if you know how to fix this let me know). 
+
+
+## Future Work
+
+Ideally, I'd like the spell checker to scan and highlight the text in the boxes as I typed but I fear that is 
+beyond my understanding of the QGIS/Qt interface. Next highest on my wish list is for the list of spelling 
+issues to be non-modal so I can cut and paste fixes into the text box, rather than having to memorise the 
+correct spelling, close the window and then type it in (again answers on a github issue). 
+
+I'm sure all sorts of things will come up once people start using it, so as usual issues and PRs are welcome 
+at https://github.com/ianturton/qgis-spellcheck.
+
diff --git a/images/geojson.jpg b/images/geojson.jpg