Skip to content

Separate optimization for trunk and embedder #270

Answered by KevinMusgrave
lfleck asked this question in Q&A
Discussion options

You must be logged in to vote

It is common in metric learning papers to use a different learning rate for the trunk and embedder.

  • The trunk is usually a model pretrained on Imagenet, so they use a lower learning rate on that part.
  • The embedder is usually just a single fully connected layer that is randomly initialized, so they use a higher learning rate on that part.

This CVPR 2020 paper mentions using a 10 times lower learning rate on the trunk model:

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by KevinMusgrave
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants