Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no weight decay setting error in SimMIM pretraining #369

Open
wanghaoyucn opened this issue Sep 26, 2024 · 0 comments
Open

no weight decay setting error in SimMIM pretraining #369

wanghaoyucn opened this issue Sep 26, 2024 · 0 comments

Comments

@wanghaoyucn
Copy link

wanghaoyucn commented Sep 26, 2024

@torch.jit.ignore
def no_weight_decay_keywords(self):
if hasattr(self.encoder, 'no_weight_decay_keywords'):
return {'encoder.' + i for i in self.encoder.no_weight_decay_keywords()}
return {}

Hello, I found this function in class SimMIM will have an output {'encoder.cpb_mlp', 'encoder.logit_scale', 'encoder.relative_position_bias_table'}. When it is passed into the build_optimizer, finally it will call function check_keywords_in_name(name, skip_keywords) to check if we need to set weight decay of this parameter to 0.

def check_keywords_in_name(name, keywords=()):
isin = False
for keyword in keywords:
if keyword in name:
isin = True
return isin

Sadly, 'encoder.cpb_mlp' in 'encoder.layers.0.blocks.0.attn.cpb_mlp.0.bias' == False, which means the weight decay of cpb_mlp is not 0 during pretraining. The right implementation of no_weight_decay_keywords would be:

@torch.jit.ignore 
 def no_weight_decay_keywords(self): 
     if hasattr(self.encoder, 'no_weight_decay_keywords'): 
         return {i for i in self.encoder.no_weight_decay_keywords()} 
     return {} 

Is this an intentional behavior or a bug? I appreciate your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant