Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用IWSLT17中-英数据集,在训练过程中BLEU持续升高,没有收敛的迹象,但模型在测试集上的泛化能力很差 #93

Open
edwardelric1202 opened this issue Jul 2, 2020 · 1 comment

Comments

@edwardelric1202
Copy link

edwardelric1202 commented Jul 2, 2020

使用的是IWSLT17中-英数据集,模型为Transformer,在训练过程中BLEU值一直在升高没有收敛,请问这是什么原因,与超参数的设置有关吗?
INFO:tensorflow:BLEU at step 10000: 0.110296
INFO:tensorflow:BLEU at step 20000: 0.144964
INFO:tensorflow:BLEU at step 30000: 0.178070
INFO:tensorflow:BLEU at step 40000: 0.198967
INFO:tensorflow:BLEU at step 50000: 0.222250
INFO:tensorflow:BLEU at step 60000: 0.245278
INFO:tensorflow:BLEU at step 70000: 0.266681
INFO:tensorflow:BLEU at step 80000: 0.286975
INFO:tensorflow:BLEU at step 90000: 0.308338
INFO:tensorflow:BLEU at step 100000: 0.324188
`

初始参数设置如下:
--parameters=batch_size=2048,device_list=[0],train_steps=100000,eval_steps=2000,update_cycle=4

在测试集(newstest)上,只有11左右的BLEU值。

@Playinf
Copy link
Collaborator

Playinf commented Aug 19, 2020

首先,不知道这里开发集用的是什么。IWSLT数据集是口语的数据集,并且规模较小,newstest是新闻的数据集,这两个领域差距很大,newstest上BLEU低是可以理解的。训练过程中的BLEU一般是算的BPE后的BLEU而非tokenize后的BLEU,这个值一般会偏高。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants