An implementation of RNNsearch using Theano, the implementation is identical to GroundHog.
This repository is deprecated. See RNNsearch
- Build vocabulary
- Build source vocabulary
python scripts/ --corpus zh.txt --output vocab.zh.pkl
--limit 30000 --groundhog
- Build target vocabulary
python scripts/ --corpus en.txt --output vocab.en.pkl
--limit 30000 --groundhog
- Shuffle corpus (Optional)
python scripts/ --corpus zh.txt en.txt
- Training from random initialization
python train --corpus zh.txt.shuf en.txt.shuf
--vocab zh.vocab.pkl en.vocab.pkl --model nmt --embdim 620 620
--hidden 1000 1000 1000 --maxhid 500 --deephid 620 --maxpart 2
--alpha 5e-4 --norm 1.0 --batch 128 --maxepoch 5 --seed 1234
--freq 1000 --vfreq 1500 --sfreq 50 --sort 20 --validation nist02.src
--references nist02.ref0 nist02.ref1 nist02.ref2 nist02.ref3
- Initialize with a trained model
python train --corpus zh.txt.shuf en.txt.shuf
--vocab zh.vocab.pkl en.vocab.pkl --model nmt --embdim 620 620
--hidden 1000 1000 1000 --maxhid 500 --deephid 620 --maxpart 2
--alpha 5e-4 --norm 1.0 --batch 128 --maxepoch 5 --seed 1234
--freq 1000 --vfreq 1500 --sfreq 50 --sort 20 --validation nist02.src
--references nist02.ref0 nist02.ref1 nist02.ref2 nist02.ref3
- Resume training
python train --model nmt.autosave.pkl
python translate --model < input > translation
Models trained by GroundHog can be converted to our format using, only support RNNsearch architecture
python scripts/ --state search_state.pkl --model search_model.npz
--output nmt.pkl
Convert models trained by old versions
python scripts/ oldmodel.pkl newmodel.pkl