Why I get low BLEU on zh-en of NMT ? #83

JxuHenry · 2019-04-16T00:35:26Z

I only modified the corpus and trained it. Corpus preprocessing is the same as "get_data_enfr.sh" file wrote.
Operating parameters are as follows：
python main.py --exp_name zhTest --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'en,zh' --n_mono -1 --mono_dataset 'zh:./data/mono/all.zh.tok.60000.pth,,;en:./data/mono/all.en.tok.60000.pth,,' --para_dataset 'en-zh:,./data/para/newdev/newsdev2017-enzh-src.XX.60000.pth,./data/para/newdev/newsdev2017-zhen-ref.XX.60000.pth' --mono_directions 'zh,en' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'en-zh-en,zh-en-zh' --pretrained_emb './data/mono/all.zh-en.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 30 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_zh_en_valid,10
Do I need to modify other things?

HAOHAOXUEXI5776 · 2019-04-17T11:27:40Z

Hi JxuHenry, I'm curious about where to obtain the monolingual corpus for Chinese? Could you share your experience? Thx in advance.

JxuHenry · 2019-04-18T00:35:16Z

Hi JxuHenry, I'm curious about where to obtain the monolingual corpus for Chinese? Could you share your experience? Thx in advance.

Hi！The UN officially provides parallel corpus for conferences, but I only used Chinese corpus for training.

HAOHAOXUEXI5776 · 2019-04-18T11:48:09Z

Hi！The UN officially provides parallel corpus for conferences, but I only used Chinese corpus for training.
Oh, thanks for your reply. I found yesterday a nice Chinese corpus for multiply tasks, which also contains monoligual corpus. If you have interest, you can find it here

JxuHenry · 2019-04-19T01:07:12Z

Hi！The UN officially provides parallel corpus for conferences, but I only used Chinese corpus for training.
Oh, thanks for your reply. I found yesterday a nice Chinese corpus for multiply tasks, which also contains monoligual corpus. If you have interest, you can find it here

OK, thank you very much

cycao77 · 2019-06-03T08:52:46Z

Hi JxuHenry, I also had the same problem. Have you solved it ?

JxuHenry · 2019-09-06T06:35:59Z

Hi JxuHenry, I also had the same problem. Have you solved it ?

No I haven't，sorry

JianLiu91 · 2019-10-24T15:18:33Z

Hi, how do you obtain the shared embeddings ./data/mono/all.zh-en.60000.vec ?
Trained on the concatenate data using fastest?
Have you tried on using MUSE to get the aligned embeddings? I think it might help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why I get low BLEU on zh-en of NMT ? #83

Why I get low BLEU on zh-en of NMT ? #83

JxuHenry commented Apr 16, 2019 •

edited

Loading

HAOHAOXUEXI5776 commented Apr 17, 2019

JxuHenry commented Apr 18, 2019

HAOHAOXUEXI5776 commented Apr 18, 2019

JxuHenry commented Apr 19, 2019

cycao77 commented Jun 3, 2019

JxuHenry commented Sep 6, 2019

JianLiu91 commented Oct 24, 2019 •

edited

Loading

Why I get low BLEU on zh-en of NMT ? #83

Why I get low BLEU on zh-en of NMT ? #83

Comments

JxuHenry commented Apr 16, 2019 • edited Loading

HAOHAOXUEXI5776 commented Apr 17, 2019

JxuHenry commented Apr 18, 2019

HAOHAOXUEXI5776 commented Apr 18, 2019

JxuHenry commented Apr 19, 2019

cycao77 commented Jun 3, 2019

JxuHenry commented Sep 6, 2019

JianLiu91 commented Oct 24, 2019 • edited Loading

JxuHenry commented Apr 16, 2019 •

edited

Loading

JianLiu91 commented Oct 24, 2019 •

edited

Loading