You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I found this function in class SimMIM will have an output {'encoder.cpb_mlp', 'encoder.logit_scale', 'encoder.relative_position_bias_table'}. When it is passed into the build_optimizer, finally it will call function check_keywords_in_name(name, skip_keywords) to check if we need to set weight decay of this parameter to 0.
Sadly, 'encoder.cpb_mlp' in 'encoder.layers.0.blocks.0.attn.cpb_mlp.0.bias' == False, which means the weight decay of cpb_mlp is not 0 during pretraining. The right implementation of no_weight_decay_keywords would be:
Swin-Transformer/models/simmim.py
Lines 154 to 158 in f82860b
Hello, I found this function in class
SimMIM
will have an output{'encoder.cpb_mlp', 'encoder.logit_scale', 'encoder.relative_position_bias_table'}
. When it is passed into thebuild_optimizer
, finally it will call functioncheck_keywords_in_name(name, skip_keywords)
to check if we need to set weight decay of this parameter to 0.Swin-Transformer/optimizer.py
Lines 76 to 81 in f82860b
Sadly,
'encoder.cpb_mlp' in 'encoder.layers.0.blocks.0.attn.cpb_mlp.0.bias' == False
, which means the weight decay ofcpb_mlp
is not 0 during pretraining. The right implementation ofno_weight_decay_keywords
would be:Is this an intentional behavior or a bug? I appreciate your help!
The text was updated successfully, but these errors were encountered: