You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, how to get <input_mask> for GLM model? Is it right that all positions for padding tokens should be 0 for <input_mask>? Do I need to set other positions as 0 based on <input_ids>? This question is not discussed in the paper.
Second, if I set <adv_begin_iter> as -1, the optimization of model will be stuck in NaN issue. But if I set <adv_begin_iter> as 20 or larger number, the NaN issue disappear. Did you encounter the same issue during experiments? Or is there any other methods to fix NaN problem?
Based on https://github.com/zhuchen03/FreeLB/blob/master/fairseq-RoBERTa/fairseq/tasks/sentence_prediction.py#L103, I implemented FreeLB at the finetune stage for GLM model. I have four questions.
First, how to get <input_mask> for GLM model? Is it right that all positions for padding tokens should be 0 for <input_mask>? Do I need to set other positions as 0 based on <input_ids>? This question is not discussed in the paper.
Second, if I set <adv_begin_iter> as -1, the optimization of model will be stuck in NaN issue. But if I set <adv_begin_iter> as 20 or larger number, the NaN issue disappear. Did you encounter the same issue during experiments? Or is there any other methods to fix NaN problem?
Third, I found you didn't use <adv_begin_iter> in your bert model(https://github.com/zhuchen03/FreeLB/blob/master/huggingface-transformers/examples/run_glue_freelb.py#L224). Does this mean bert-base is more stable than Roberta? Or <adv_begin_iter> differs between different models?
Finally, where to find the code implementation for 'when adversarial training meets dropout' in the paper?
Looking forward to your response. Thanks!
The text was updated successfully, but these errors were encountered: