This repository contains materials for our tutorial on automatic grammatical error correction: R. Grundkiewicz, C. Bryant, M. Felice: A Crash Course in Automatic Grammatical Error Correction, COLING 2020.
Links and materials:
- Tutorial proposal with the recommended reading list
- Slides (Parts I-V)
- Bibliography
- List of GEC resources
Part I: Introduction
- About the tutorial
- Task definition
- Challenges
Part II: Historical and recent approaches
- Rule-based methods
- Language models
- Error-type classifiers
- Statistical machine translation
- Deep neural networks
- Shared tasks
Part III: Data and evaluation
- Data annotation
- Error corpora
- Evaluation metrics
- Human evaluation
Part IV: Neural grammatical error correction
- Neural approach to GEC
- GEC as low-resource NMT
- Data sparsity
- Correction efficacy
- Beyond the NMT framework
Part V: Recent and future work
- Findings from the BEA-2019 shared task
- Towards unsupervised GEC
- Non-English languages
- Future work
Data sets, evaluation scripts and other resources related to the field of automatic grammatical error correction.
Publicly available error corpora for English:
- W&I+LOCNESS Corpora [paper] [download v2.1]
- FCE Data Set [paper] [download v2.1]
- JFLEG (JHU FLuency-Extended GUG) Corpus [download]
- AESW (Automated Evaluation of Scientific Writing Data Set) [download v1.2]
- NUCLE (NUS Corpus of Learner English) [paper] [download v3.3]
- Annotated Test Data from the CoNLL 2013 & 2014 Shared Task [CoNLL-2013] [CoNLL-2014]
- 8 additional annotations for the CoNLL 2014 data set [paper] [download]
- Lang-8 Learner Corpora [download v2.0] [download pre-processed]
- The WikEd Error Corpus [download v1.0]
- GMEG data sets [paper] [download]
- CWEB (Corrected Websites) data sets [paper] [download]
System outputs:
- System outputs from the CoNLL 2014 Shared Task [download]
- System outputs from the BEA 2019 Shared Task [download]
Publicly available error corpora for other languages:
- Chinese: NLPCC 2018 Task 2 GEC [paper] [website], NLPTEA 2016 CGED [paper] [website]
- Czech: AKCES-GEC [paper] [website]
- German: Falko & MERLIN Corpora [paper] [website]
- Polish: PlEWiC [website]
- Russian: RULEC-GEC dataset [paper] [website]
- M2Scorer [software] [paper]
- ERRANT [software] [paper]
- GLEU [software] [paper #1] [paper #2]
- BEA 2019 Shared Task: Grammatical Error Correction [website] [paper]
- NLPCC 2018 Shared Task 2 - Grammatical Error Correction for Chinese [website] [paper]
- Automated Evaluation of Scientific Writing Shared Task 2016 [website]
- The Second QALB Shared Task on Automatic Text Correction for Arabic 2015 [paper]
- CoNLL-2014 Shared Task: Grammatical Error Correction [website] [paper]
- CoNLL-2013 Shared Task: Grammatical Error Correction [website] [paper]
- http://nlpprogress.com/english/grammatical_error_correction.html
- Slides from the COLING2014 tutorial on Automatic GEC for Language Learners
(This list is incomplete, please feel free to open a pull request if you would like to add something to the list)
@inproceedings{grundkiewicz-etal-2020-crash,
title = "A Crash Course in Automatic Grammatical Error Correction",
author = "Grundkiewicz, Roman and Bryant, Christopher and Felice, Mariano",
booktitle = "Proceedings of the 28th International Conference
on Computational Linguistics: Tutorial Abstracts",
month = dec,
year = "2020",
address = "Barcelona, Spain (Online)",
publisher = "International Committee for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.coling-tutorials.6",
pages = "33--38",
}