-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage & concept questions #18
Comments
Just a quick question: wd_dict = get_weight_decays(model) # {'lstm_1/recurrent:0':0, 'output/kernel:0':0} optimizer = AdamW(lr=1e-4, weight_decays=weight_decays, lr_multipliers=lr_multipliers, If I understand correctly, weight_decays is similar to the L2 penalty. What is lr_multipliers really stands for? Do we have to give it a same name as input ("lstm_1")? use_cosine_annealing means we use a large learning rate after some time, right? Thank you for your help. I think your repo is way better than any other adamw version in Keras. |
@ChongWu-Biostat You're welcome, glad you find it useful. Suppose I'll make a more detailed example to explain in case the README didn't suffice, but for now I'll respond to your questions: Weight decays vs. L2 penalty The key difference between
By fixing weight decay rate and separating it from loss, all of the above are remedied. How to use Suppose you have a model:
(
Names don't have to match exactly; substrings work also: What is cosine annealing?
For example, at approx. |
Got it. Thank you for your explanation. I understand it now. |
It works perfectly with me. Thank you for sharing and developing this repo. I think this idea really works (at least for my problem).
Thanks,
Chong
The text was updated successfully, but these errors were encountered: