End-to-End Coreference Resolution with Different Higher-Order Inference Methods

This repo is a codebase snapshot of lxucs/coref-hoi; active issues or updates are maintained in lxucs/coref-hoi repository.

End-to-End Coreference Resolution with Different Higher-Order Inference Methods

This repository contains the implementation of the paper: Multilingual Coreference Resolution with Harmonized Annotations based on Revealing the Myth of Higher-Order Inference in Coreference Resolution.

Architecture

The basic end-to-end coreference model is a PyTorch re-implementation based on the TensorFlow model following similar preprocessing (see this repository).

Files:

run.py: training and evaluation
model.py: the coreference model
higher_order.py: higher-order inference modules
analyze.py: result analysis
preprocess.py: converting CoNLL files to examples
tensorize.py: tensorizing example
conll.py, metrics.py: same CoNLL-related files from the repository
experiments.conf: different model configurations

Basic Setup

Set up environment and data for training and evaluation:

Install Python3 dependencies: pip install -r requirements.txt
Create a directory for data that will contain all data files, models and log files; set data_dir = /path/to/data/dir in experiments.conf
Prepare dataset (requiring CorefUD corpus):
python preprocess.py [config]
- e.g. python preprocess.py train_mbert_czech

Evaluation

The name of each directory corresponds with a configuration in experiments.conf. Each directory has two trained models inside.

If you want to use the official evaluator, download and unzip corefUD scorer under this directory.

Evaluate a model on the dev/test set:

Download the corresponding model directory and unzip it under data_dir
python evaluate.py [config] [model_id] [gpu_id]
- e.g. Attended Antecedent:python evaluate.py train_spanbert_large_ml0_d2 May08_12-38-29_58000 0

Training

python run.py [config] [gpu_id]

[config] can be any configuration in experiments.conf
Log file will be saved at your_data_dir/[config]/log_XXX.txt
Models will be saved at your_data_dir/[config]/model_XXX.bin
Tensorboard is available at your_data_dir/tensorboard

Configurations

Some important configurations in experiments.conf:

data_dir: the full path to the directory containing dataset, models, log files
bert_pretrained_name_or_path: the name/path of the pretrained BERT model (HuggingFace BERT models)
max_training_sentences: the maximum segments to use when document is too long.

Results

	F1	F1 (without singletons)
catalan	50.29	62.78
czech	60.52	66.64
czech-pcedt	69.59	69.73
english-gum	50.80	65.76
english-parcor	57.47	58.12
german	45.35	58.89
german-parcor	55.40	56.51
hungarian	56.15	57.40
lithuanian	67.02	67.90
polish	43.13	62.39
russian	62.33	62.43
spanish	50.22	64.81

avg	54.19	62.48

Citation

@inproceedings{pravzak2021multilingual,
  title={Multilingual Coreference Resolution with Harmonized Annotations},
  author={Pra{\v{z}}{\'a}k, Ond{\v{r}}ej and Konop{\'\i}k, Miloslav and Sido, Jakub},
  booktitle={Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)},
  pages={1119--1123},
  year={2021}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End Coreference Resolution with Different Higher-Order Inference Methods

Architecture

Basic Setup

Evaluation

Training

Configurations

Results

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
CorefUD2Ontonotes.py		CorefUD2Ontonotes.py
LICENSE		LICENSE
README.md		README.md
analyze.py		analyze.py
conll.py		conll.py
evaluate.py		evaluate.py
experiments.conf		experiments.conf
higher_order.py		higher_order.py
metrics.py		metrics.py
model.py		model.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
run.py		run.py
setup_data.sh		setup_data.sh
tensorize.py		tensorize.py
udapi_io.py		udapi_io.py
util.py		util.py

License

ondfa/coref-multiling

Folders and files

Latest commit

History

Repository files navigation

End-to-End Coreference Resolution with Different Higher-Order Inference Methods

Architecture

Basic Setup

Evaluation

Training

Configurations

Results

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages