Skip to content

A notebook for Grammar Error Correction using the Google Research C4 200M Data Set

Notifications You must be signed in to change notification settings

tpackolus/Grammar-Error-Correction_github

 
 

Repository files navigation

GEC

This repository contains code for the NeuroMatch Academy (NMA 2023) DL summer course project attempting to evaluate Grammar Error Correction and Detection with neural networks.

The project is mainly an exercise to replicate findings from different NNs for GEC by re-implementing and evaluating several basic analyses.


Instructions:

Data conversion

Use this kaggle notebook to get the data, click on Copy my edit to get you a new notebook. Use the util file to convert tsv to hdf5.

There are two modes which are available as subcommands. The first is 'single file' mode, which converts one specified tsv file to hdf5 Its usage is as the following:

python csv_to_hf5.py single [-h] [-i TSV_PATH] [-o HDF5_PATH] [--percentage 0.1 (default 0.1)]

The next mode is 'batch' mode. Here you specify the folder containing a set of input tsv files and they are automatically converted and given the same name, just with tsv changed to hdf5. Its usage is as the following:

python csv_to_hf5.py batch [-h] [-i INPUT_DIR] [--percentage 0.1 (default 0.1)]

Run the model

By using baseline notebook you can run the model.


Credits

Original code is presented here.

About

A notebook for Grammar Error Correction using the Google Research C4 200M Data Set

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.0%
  • Python 3.0%