NaN encounted if FreeLB is used at the beginning of finetune stage #19

BoomSky0416 · 2022-03-25T08:00:36Z

Based on https://github.com/zhuchen03/FreeLB/blob/master/fairseq-RoBERTa/fairseq/tasks/sentence_prediction.py#L103, I implemented FreeLB at the finetune stage for GLM model. I have four questions.

First, how to get <input_mask> for GLM model? Is it right that all positions for padding tokens should be 0 for <input_mask>? Do I need to set other positions as 0 based on <input_ids>? This question is not discussed in the paper.

Second, if I set <adv_begin_iter> as -1, the optimization of model will be stuck in NaN issue. But if I set <adv_begin_iter> as 20 or larger number, the NaN issue disappear. Did you encounter the same issue during experiments? Or is there any other methods to fix NaN problem?

Third, I found you didn't use <adv_begin_iter> in your bert model(https://github.com/zhuchen03/FreeLB/blob/master/huggingface-transformers/examples/run_glue_freelb.py#L224). Does this mean bert-base is more stable than Roberta? Or <adv_begin_iter> differs between different models?

Finally, where to find the code implementation for 'when adversarial training meets dropout' in the paper?

Looking forward to your response. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NaN encounted if FreeLB is used at the beginning of finetune stage #19

NaN encounted if FreeLB is used at the beginning of finetune stage #19

BoomSky0416 commented Mar 25, 2022

NaN encounted if FreeLB is used at the beginning of finetune stage #19

NaN encounted if FreeLB is used at the beginning of finetune stage #19

Comments

BoomSky0416 commented Mar 25, 2022