Separate optimization for trunk and embedder #270

lfleck · 2021-01-30T19:23:52Z

lfleck
Jan 30, 2021

Hi guys,
your examples (e.g. TwoStreamMetricLoss) show separate optimizers for trunk and embedder.
Is that just for the sake of explanation how to deal with trunk and embedder in general, or is there a design decision behind? In the case of the latter, do you have any references for that? Most literature I've seen so far trains them jointly (e.g. SimCLR). Many thanks!

Answered by KevinMusgrave

Jan 30, 2021

It is common in metric learning papers to use a different learning rate for the trunk and embedder.

The trunk is usually a model pretrained on Imagenet, so they use a lower learning rate on that part.
The embedder is usually just a single fully connected layer that is randomly initialized, so they use a higher learning rate on that part.

This CVPR 2020 paper mentions using a 10 times lower learning rate on the trunk model:

View full answer

KevinMusgrave · 2021-01-30T23:57:22Z

KevinMusgrave
Jan 30, 2021
Maintainer

It is common in metric learning papers to use a different learning rate for the trunk and embedder.

The trunk is usually a model pretrained on Imagenet, so they use a lower learning rate on that part.
The embedder is usually just a single fully connected layer that is randomly initialized, so they use a higher learning rate on that part.

This CVPR 2020 paper mentions using a 10 times lower learning rate on the trunk model:

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate optimization for trunk and embedder #270

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Separate optimization for trunk and embedder #270

lfleck Jan 30, 2021

Replies: 1 comment

KevinMusgrave Jan 30, 2021 Maintainer

lfleck
Jan 30, 2021

KevinMusgrave
Jan 30, 2021
Maintainer