Learning rate increases? #4

myui · 2018-12-12T06:17:58Z

I have a question about the following part of the paper:

In the dot product, ∇f(θ_{t-1})・∇f(θ_{t-2}), sign(∇f(θ_{t-1})) and sign(∇f(θ_{t-2})) often be same for sign.

Then, learning rate α_{t} would be monotonically increases in the above equation where sign(∇f(θ_{t-1})) = sign(∇f(θ_{t-2})).

I assume the difference between a gradient at t-1 and the previous gradient at t-2 is usually small.

Am I missing something?

The text was updated successfully, but these errors were encountered:

akaniklaus · 2019-01-04T11:44:01Z

Well, I don't know about the equation but practically, it first increases the learning rate if it is too low. @gbaydin I also experienced that the learning rate sometimes get negative especially if hypergrad_lr is high. Should we maybe place a constraint (e.g. clipping) to prevent that from happening?

myui · 2019-01-30T01:19:41Z

By modifying algorithms described in the original paper, adam-hd worked fine
https://github.com/apache/incubator-hivemall/blob/master/core/src/main/java/hivemall/optimizer/Optimizer.java#L674

This thesis (multiplicatative hypergradient descent) helped.
https://github.com/damaru2/convergence_analysis_hypergradient_descent/blob/master/dissertation_hypergradients.pdf

Negative learning rate can be seen in the original experiments but it's accepted in my understanding. Some clipping might help though.

akaniklaus · 2019-01-30T16:21:03Z

@myui What do you mean by fine? What was exactly wrong with the version that is in this repository?

myui · 2019-01-30T20:23:41Z

@akaniklaus learning rates monotonically increased certain condition because ∇f(θ_{t-1})・∇f(θ_{t-2}) will usually become greater than 0.

gbaydin · 2019-01-30T22:15:47Z

@myui if you look at the results in the paper and in David Martinez's thesis, you can see that the algorithms, as they are formulated in the paper, can both increase and decrease the learning rate according to the loss landscape. I think your interpretation that a monotonically increasing learning rate would be observed is not correct. It is, however, correct that a small initial learning rate is most of the time increased (almost monotonically) up to some limit in the initial part of the training, but if you run training long enough, this is almost always followed by a decay (decrease) of the learning rate during the rest of the training. The poster here gives a quick summary: https://github.com/gbaydin/hypergradient-descent/raw/master/poster/iclr_2018_poster.pdf

You can of course have your own modifications of this algorithm.

gbaydin · 2019-01-30T22:19:49Z

Negative learning rates sometimes happen, and it's not as catastrophic as it first sounds. It just means that the algorithm decides to backtrack (do gradient ascent instead of descent) under some conditions. In my observation, negative learning rates happen in the late stages of training where the learning rate has decayed towards a very low positive value and started to fluctuate around it. If the fluctuation is too strong, and if the decayed value is close to zero, this means that sometimes learning rate becomes negative. I think this in effect means that the algorithm stays in the same region of the loss landscape because it has converged to a (local) optimum. My view is that it is valuable to reason about this behavior and pursue a theoretical understanding of its implications, rather than adding extra heuristics to "fix" or clip this behavior. I haven't had much time to explore this yet, but hope to do so in the near future.

myui · 2019-01-30T22:35:43Z

Backtracking make sense.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning rate increases? #4

Learning rate increases? #4

myui commented Dec 12, 2018 •

edited

Loading

akaniklaus commented Jan 4, 2019

myui commented Jan 30, 2019

akaniklaus commented Jan 30, 2019

myui commented Jan 30, 2019

gbaydin commented Jan 30, 2019 •

edited

Loading

gbaydin commented Jan 30, 2019 •

edited

Loading

myui commented Jan 30, 2019

Learning rate increases? #4

Learning rate increases? #4

Comments

myui commented Dec 12, 2018 • edited Loading

akaniklaus commented Jan 4, 2019

myui commented Jan 30, 2019

akaniklaus commented Jan 30, 2019

myui commented Jan 30, 2019

gbaydin commented Jan 30, 2019 • edited Loading

gbaydin commented Jan 30, 2019 • edited Loading

myui commented Jan 30, 2019

myui commented Dec 12, 2018 •

edited

Loading

gbaydin commented Jan 30, 2019 •

edited

Loading

gbaydin commented Jan 30, 2019 •

edited

Loading