LexiconAugmentedNER

This is the implementation of our arxiv paper "Simplify the Usage of Lexicon in Chinese NER", which rejects complicated operations for incorporating word lexicon in Chinese NER. We show that incorporating lexicon in Chinese NER can be quite simple and, at the same time, effective.

Source code description

Requirement:

Python 3.6 Pytorch 0.4.1

Input format:

CoNLL format, with each character and its label split by a whitespace in a line. The "BMES" tag scheme is prefered.

别 O

错 O

过 O

邻 O

近 O

大 B-LOC

鹏 M-LOC

湾 E-LOC

的 O

湿 O

地 O

Pretrain embedding:

The pretrained embeddings(word embedding, char embedding and bichar embedding) are the same with Lattice LSTM

Run the code:

Download the character embeddings and word embeddings from Lattice LSTM and put them in the data folder.
Download the four datasets in data/MSRANER, data/OntoNotesNER, data/ResumeNER and data/WeiboNER, respectively.
To train on the four datasets:

To train on OntoNotes:

python main.py --train data/OntoNotesNER/train.char.bmes --dev data/OntoNotesNER/dev.char.bmes --test data/OntoNotesNER/test.char.bmes --modelname OntoNotes --savedset data/OntoNotes.dset

To train on Resume:

python main.py --train data/ResumeNER/train.char.bmes --dev data/ResumeNER/dev.char.bmes --test data/ResumeNER/test.char.bmes --modelname Resume --savedset data/Resume.dset --hidden_dim 200

To train on Weibo:

python main.py --train data/WeiboNER/train.all.bmes --dev data/WeiboNER/dev.all.bmes --test data/WeiboNER/test.all.bmes --modelname Weibo --savedset data/Weibo.dset --lr=0.005 --hidden_dim 200

To train on MSRA:

python main.py --train data/MSRANER/train.char.bmes --dev data/MSRANER/dev.char.bmes --test data/MSRANER/test.char.bmes --modelname MSRA --savedset data/MSRA.dset

To train/test your own data: modify the command with your file path and run.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
model		model
save_model		save_model
utils		utils
README.md		README.md
main.py		main.py
test.sh		test.sh
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LexiconAugmentedNER

Source code description

Requirement:

Input format:

Pretrain embedding:

Run the code:

About

Releases

Packages

Languages

550952213/LexiconAugmentedNER

Folders and files

Latest commit

History

Repository files navigation

LexiconAugmentedNER

Source code description

Requirement:

Input format:

Pretrain embedding:

Run the code:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages