-
Notifications
You must be signed in to change notification settings - Fork 262
Why I get low BLEU on zh-en of NMT ? #83
Comments
Hi JxuHenry, I'm curious about where to obtain the monolingual corpus for Chinese? Could you share your experience? Thx in advance. |
Hi!The UN officially provides parallel corpus for conferences, but I only used Chinese corpus for training. |
|
OK, thank you very much |
Hi JxuHenry, I also had the same problem. Have you solved it ? |
No I haven't,sorry |
Hi, how do you obtain the shared embeddings |
I only modified the corpus and trained it. Corpus preprocessing is the same as "get_data_enfr.sh" file wrote.
Operating parameters are as follows:
python main.py --exp_name zhTest --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'en,zh' --n_mono -1 --mono_dataset 'zh:./data/mono/all.zh.tok.60000.pth,,;en:./data/mono/all.en.tok.60000.pth,,' --para_dataset 'en-zh:,./data/para/newdev/newsdev2017-enzh-src.XX.60000.pth,./data/para/newdev/newsdev2017-zhen-ref.XX.60000.pth' --mono_directions 'zh,en' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'en-zh-en,zh-en-zh' --pretrained_emb './data/mono/all.zh-en.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 30 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_zh_en_valid,10
Do I need to modify other things?
The text was updated successfully, but these errors were encountered: