Chinese Word Segmentation

The baseline architecture is based on the work of Ma et al.. An improved transformer-based architecture is also implemented.

The dataset can be downloaded from here.

Model	AS	CITYU	MSR	PKU
This work	96.5	97.5	97.7	96.3
Ma et al. (2018)	96.2	97.2	97.4	96.1

The train script is in cws/train.py. Run this to see all the input parameters

python cws/train.py -h

You can generate segmented sentences by running cws/predictor.py.

python cws/predictor.py -h

The evaluation uses the official scripts, in scripts/score.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
cws		cws
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback