The updated learning rate is different for every parameter in AdamHD #9

h-spiess · 2019-06-25T13:53:30Z

Hey,

First, nice work! :)

I'm referring to the Adam version (AdamHD). SGD doesn't seem to have that problem.

if i understand the paper correctly the gradient w.r.t. all parameters is used to update the learning rate. the learning rate is then updated once and can be used to do gradient descent on the parameters.

with your implementation, although, the learning rate is successively updated w.r.t. to the current parameter gradient (within the optimizer loop over the parameters) and then directly used for gradient descent on that parameter.

this leads effectively to a different learning rate for every parameter as it is successively modified in the process. only the last parameters in the backpropagation are updated with the learning rate that received the "full" gradient descent step.

am i missing something? thanks for your help :)

Kind regards, Heiner

harshalmittal4 · 2019-07-09T19:37:30Z

I suppose that 'p' instead of being a single parameter, represents a tensor containing all the parameters...is it so @gbaydin?

harshalmittal4 · 2019-07-09T21:32:59Z

Hello @gbaydin , when model.parameters() is passed as an argument to the optimizer, it represents a single parameter group.
In this parameter group, group['params'] contains 2 elements(tensors) (i.e 2 'p' s) for the logreg model; so does that mean that all parameters of the logreg model are represented by 2 tensors and both are updated at each optimization step?
Thanks!

harshalmittal4 · 2019-07-09T21:35:20Z

If this is the case, the updated learning rate would be different for both the parameter tensors in each optimization step I suppose.

harshalmittal4 · 2019-08-01T03:32:22Z

@gbaydin Can you please clarify on this, thanks.

h-spiess changed the title ~~The updated learning rate is different for every parameter~~ The updated learning rate is different for every parameter in AdamHD Jun 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The updated learning rate is different for every parameter in AdamHD #9

The updated learning rate is different for every parameter in AdamHD #9

h-spiess commented Jun 25, 2019 •

edited

Loading

harshalmittal4 commented Jul 9, 2019

harshalmittal4 commented Jul 9, 2019 •

edited

Loading

harshalmittal4 commented Jul 9, 2019 •

edited

Loading

harshalmittal4 commented Aug 1, 2019

The updated learning rate is different for every parameter in AdamHD #9

The updated learning rate is different for every parameter in AdamHD #9

Comments

h-spiess commented Jun 25, 2019 • edited Loading

harshalmittal4 commented Jul 9, 2019

harshalmittal4 commented Jul 9, 2019 • edited Loading

harshalmittal4 commented Jul 9, 2019 • edited Loading

harshalmittal4 commented Aug 1, 2019

h-spiess commented Jun 25, 2019 •

edited

Loading

harshalmittal4 commented Jul 9, 2019 •

edited

Loading

harshalmittal4 commented Jul 9, 2019 •

edited

Loading