This log describes all the changes made to ERRANT since its release.
- Missed one case of changing Levenshtein to rapidfuzz... Now fixed.
- Replaced the dependency on python-Levenshtein with rapidfuzz to overcome a licensing conflict. ERRANT and its dependencies now all use the MIT license. #34
-
Added some new rules to reduce the number of OTHER-type 1:1 edits and classify them as something else. Specifically, there are now ~40% fewer 1:1 OTHER edits and ~15% fewer n:n OTHER edits overall (tested on the FCE and W&I training sets combined). The changes are as follows:
- A possessive suffix at the start of a merge sequence is now always split:
Example people life -> people 's lives Old life -> 's lives (R:OTHER) New ε -> 's (M:NOUN:POSS), life -> lives (R:NOUN:NUM) -
NUM <-> DET edits are now classified as R:DET; e.g. one (cat) -> a (cat). Thanks to @katkorre!
-
Changed the string similarity score in the classifier from the Levenshtein ratio to the normalised Levenshtein distance based on the length of the longest input string. This is because we felt some ratio scores were unintuitive; e.g. smt -> something has a ratio score of 0.5 despite the insertion of 6 characters (the new normalised score is 0.33).
-
The non-word spelling error rules were updated slightly to take the new normalised Levenshtein score into account. Additionally, dissimilar strings are now classified based on the POS tag of the correction rather than as OTHER; e.g. amougnht -> number (R:NOUN).
-
The new normalised Levenshtein score is also used to classify many of the remaining 1:1 replacement edits that were previously classified as OTHER. Many of these are real-word spelling errors (e.g. their <-> there), but there are also some morphological errors (e.g. health -> healthy) and POS-based errors (e.g. transport -> travel). Note that these rules are a little complex and depend on both the similarity score and the length of the original and corrected strings. For example, form -> from (R:SPELL) and eventually -> finally (R:ADV) both have the same similarity score of 0.5 yet are differentiated as different error types based on their string lengths.
-
Various minor updates:
- Changed the dependency version requirements in
setup.py
since ERRANT v2.2.x is not compatible with spaCy 3.
-
Added a copy of the NLTK Lancaster stemmer to
errant.en.lancaster
and removed the NLTK dependency. It was overkill to require the entire NLTK package just for this stemmer so we now bundle it with ERRANT. -
Replaced the deprecated
tokens_from_list
function from spaCy v1 with theDoc
function from spaCy v2 inAnnotator.parse
.
Fixed key error in the classifier for rare spaCy 2 POS tags: _SP, BES, HVS.
-
ERRANT now works with spaCy v2.2. It is 4x slower, but this change was necessary to make it work on Python 3.7.
-
SpaCy 2 uses slightly different POS tags to spaCy 1 (e.g. auxiliary verbs are now tagged AUX rather than VERB) so I updated some of the merging rules to maintain performance.
-
The character level cost in the sentence alignment function is now computed by the much faster python-Levenshtein library instead of python's native
difflib.SequenceMatcher
. This makes ERRANT 3x faster! -
Various minor updates:
- Updated the English wordlist.
- Fixed a broken rule for classifying contraction errors.
- Changed a condition in the calculation of transposition errors to be more intuitive.
- Partially updated the ERRANT POS tag map to match the updated Universal POS tag map. Specifically, EX now maps to PRON rather than ADV, LS maps to X rather than PUNCT, and CONJ has been renamed CCONJ. I did not change the mapping of RP from PART to ADP yet because this breaks several rules involving phrasal verbs.
- Added an
errant.__version__
attribute. - Added a warning about using ERRANT with spaCy 2.
- Tidied some code in the classifier.
-
ERRANT has been significantly refactored to accommodate a new API (see README). It should now also be much easier to extend to other languages.
-
Added a
setup.py
script to make ERRANTpip
installable. -
The Damerau-Levenshtein alignment code has been rewritten in a much cleaner Python implementation. This also makes ERRANT ~20% faster.
Note: All these changes do not affect system output compared with the previous version. For the first pip
release, we wanted to make sure v2.0.0 was fully compatible with the BEA-2019 shared task on Grammatical Error Correction.
Thanks to @sai-prasanna for inspiring some of these changes!
-
The
compare_m2.py
evaluation script was refactored to make it easier to use. -
We tweaked the alignment code and merging rules to not only make ERRANT ~700% faster, but also slightly more accurate.
Specifically, we simplified the lemma cost to not repeatedly call the lemmatiser for different parts-of-speech, and also replaced the character cost with python's native difflib.SequenceMatcher
instead of a character based Damerau-Levenshtein alignment.
This significantly increased the speed, but also slightly decreased performance (~0.5 F1 worse), so we additionally revisited the merging rules. The new implementation now processes the largest combinations of adjacent non-matches first, instead of processing one alignment at a time, and now also features some new or slightly modified rules (see scripts/align_text.py
for more information).
The differences between the old and new version are summarised in the following table.
Dataset | Sents | Setting | P | R | F1 | Time (secs) |
---|---|---|---|---|---|---|
FCE Dev | 2371 | Old New |
82.77 84.00 |
85.22 85.52 |
83.98 84.75 |
260 40 |
FCE Test | 2805 | Old New |
83.88 85.17 |
85.84 85.93 |
84.85 85.55 |
300 45 |
FCE Train | 30200 | Old New |
82.69 84.06 |
85.12 85.38 |
83.89 84.72 |
2965 340 |
CoNLL-2013 | 1381 | Old New |
82.64 83.27 |
82.45 82.24 |
82.54 82.75 |
315 45 |
CoNLL-2014.0 | 1312 | Old New |
78.48 79.02 |
80.38 80.18 |
79.42 79.59 |
350 45 |
CoNLL-2014.1 | 1312 | Old New |
82.50 84.04 |
82.73 82.85 |
82.61 83.44 |
385 50 |
NUCLE | 57151 | Old New |
70.14 73.20 |
80.27 81.16 |
71.95 76.97 |
7565 725 |
Fix arbitrary reordering of edits with the same start and end span; e.g.
S I am happy .
A 2 2|||M:ADV|||really|||REQUIRED|||-NONE-|||0
A 2 2|||M:ADV|||very|||REQUIRED|||-NONE-|||0
VS.
S I am happy .
A 2 2|||M:ADV|||very|||REQUIRED|||-NONE-|||0
A 2 2|||M:ADV|||really|||REQUIRED|||-NONE-|||0
Added support for multiple annotators in parallel_to_m2.py
.
Before: python3 parallel_to_m2.py -orig <orig_file> -cor <cor_file> -out <out_file>
After: python3 parallel_to_m2.py -orig <orig_file> -cor <cor_file1> [<cor_file2> ...] -out <out_file>
This is helpful if you have multiple annotations for the same orig file.
In November, spaCy changed significantly when it became version 2.0.0. Although we have not tested ERRANT with this new version, the main change seemed to be a slight increase in performance (pos tagging and parsing etc.) at a significant cost to speed. Consequently, we still recommend spaCy 1.9.0 for use with ERRANT.
ERRANT would sometimes run into memory problems if sentences were long and very different. We hence changed the default alignment from breadth-first to depth-first. This bypassed the memory problems, made ERRANT faster and barely affected results.
ERRANT v1.0 released.