Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Why I get low BLEU on zh-en of NMT ? #83

Open
JxuHenry opened this issue Apr 16, 2019 · 7 comments
Open

Why I get low BLEU on zh-en of NMT ? #83

JxuHenry opened this issue Apr 16, 2019 · 7 comments

Comments

@JxuHenry
Copy link

JxuHenry commented Apr 16, 2019

I only modified the corpus and trained it. Corpus preprocessing is the same as "get_data_enfr.sh" file wrote.
Operating parameters are as follows:

python main.py --exp_name zhTest --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'en,zh' --n_mono -1 --mono_dataset 'zh:./data/mono/all.zh.tok.60000.pth,,;en:./data/mono/all.en.tok.60000.pth,,' --para_dataset 'en-zh:,./data/para/newdev/newsdev2017-enzh-src.XX.60000.pth,./data/para/newdev/newsdev2017-zhen-ref.XX.60000.pth' --mono_directions 'zh,en' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'en-zh-en,zh-en-zh' --pretrained_emb './data/mono/all.zh-en.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 30 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_zh_en_valid,10
Do I need to modify other things?

@HAOHAOXUEXI5776
Copy link

Hi JxuHenry, I'm curious about where to obtain the monolingual corpus for Chinese? Could you share your experience? Thx in advance.

@JxuHenry
Copy link
Author

Hi JxuHenry, I'm curious about where to obtain the monolingual corpus for Chinese? Could you share your experience? Thx in advance.

Hi!The UN officially provides parallel corpus for conferences, but I only used Chinese corpus for training.

@HAOHAOXUEXI5776
Copy link

Hi!The UN officially provides parallel corpus for conferences, but I only used Chinese corpus for training.
Oh, thanks for your reply. I found yesterday a nice Chinese corpus for multiply tasks, which also contains monoligual corpus. If you have interest, you can find it here

@JxuHenry
Copy link
Author

Hi!The UN officially provides parallel corpus for conferences, but I only used Chinese corpus for training.
Oh, thanks for your reply. I found yesterday a nice Chinese corpus for multiply tasks, which also contains monoligual corpus. If you have interest, you can find it here

OK, thank you very much

@cycao77
Copy link

cycao77 commented Jun 3, 2019

Hi JxuHenry, I also had the same problem. Have you solved it ?

@JxuHenry
Copy link
Author

JxuHenry commented Sep 6, 2019

Hi JxuHenry, I also had the same problem. Have you solved it ?

No I haven't,sorry

@JianLiu91
Copy link

JianLiu91 commented Oct 24, 2019

Hi, how do you obtain the shared embeddings ./data/mono/all.zh-en.60000.vec ?
Trained on the concatenate data using fastest?
Have you tried on using MUSE to get the aligned embeddings? I think it might help.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants