The baseline architecture is based on the work of Ma et al.. An improved transformer-based architecture is also implemented.
The dataset can be downloaded from here.
Model | AS | CITYU | MSR | PKU |
---|---|---|---|---|
This work | 96.5 | 97.5 | 97.7 | 96.3 |
Ma et al. (2018) | 96.2 | 97.2 | 97.4 | 96.1 |
The train script is in cws/train.py
. Run this to see all the input parameters
python cws/train.py -h
You can generate segmented sentences by running cws/predictor.py
.
python cws/predictor.py -h
The evaluation uses the official scripts, in scripts/score
.