Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learning rate increases? #4

Open
myui opened this issue Dec 12, 2018 · 7 comments
Open

Learning rate increases? #4

myui opened this issue Dec 12, 2018 · 7 comments

Comments

@myui
Copy link

myui commented Dec 12, 2018

I have a question about the following part of the paper:

9e9cf4addd40cb92ef2f4b4833d8c2e3

In the dot product, ∇f(θ_{t-1})・∇f(θ_{t-2}), sign(∇f(θ_{t-1})) and sign(∇f(θ_{t-2})) often be same for sign.

Then, learning rate α_{t} would be monotonically increases in the above equation where sign(∇f(θ_{t-1})) = sign(∇f(θ_{t-2})).

I assume the difference between a gradient at t-1 and the previous gradient at t-2 is usually small.

Am I missing something?

@akaniklaus
Copy link

Well, I don't know about the equation but practically, it first increases the learning rate if it is too low. @gbaydin I also experienced that the learning rate sometimes get negative especially if hypergrad_lr is high. Should we maybe place a constraint (e.g. clipping) to prevent that from happening?

@myui
Copy link
Author

myui commented Jan 30, 2019

By modifying algorithms described in the original paper, adam-hd worked fine
https://github.com/apache/incubator-hivemall/blob/master/core/src/main/java/hivemall/optimizer/Optimizer.java#L674

This thesis (multiplicatative hypergradient descent) helped.
https://github.com/damaru2/convergence_analysis_hypergradient_descent/blob/master/dissertation_hypergradients.pdf

Negative learning rate can be seen in the original experiments but it's accepted in my understanding. Some clipping might help though.

@akaniklaus
Copy link

@myui What do you mean by fine? What was exactly wrong with the version that is in this repository?

@myui
Copy link
Author

myui commented Jan 30, 2019

@akaniklaus learning rates monotonically increased certain condition because ∇f(θ_{t-1})・∇f(θ_{t-2}) will usually become greater than 0.

@gbaydin
Copy link
Owner

gbaydin commented Jan 30, 2019

@myui if you look at the results in the paper and in David Martinez's thesis, you can see that the algorithms, as they are formulated in the paper, can both increase and decrease the learning rate according to the loss landscape. I think your interpretation that a monotonically increasing learning rate would be observed is not correct. It is, however, correct that a small initial learning rate is most of the time increased (almost monotonically) up to some limit in the initial part of the training, but if you run training long enough, this is almost always followed by a decay (decrease) of the learning rate during the rest of the training. The poster here gives a quick summary: https://github.com/gbaydin/hypergradient-descent/raw/master/poster/iclr_2018_poster.pdf

You can of course have your own modifications of this algorithm.

@gbaydin
Copy link
Owner

gbaydin commented Jan 30, 2019

Negative learning rates sometimes happen, and it's not as catastrophic as it first sounds. It just means that the algorithm decides to backtrack (do gradient ascent instead of descent) under some conditions. In my observation, negative learning rates happen in the late stages of training where the learning rate has decayed towards a very low positive value and started to fluctuate around it. If the fluctuation is too strong, and if the decayed value is close to zero, this means that sometimes learning rate becomes negative. I think this in effect means that the algorithm stays in the same region of the loss landscape because it has converged to a (local) optimum. My view is that it is valuable to reason about this behavior and pursue a theoretical understanding of its implications, rather than adding extra heuristics to "fix" or clip this behavior. I haven't had much time to explore this yet, but hope to do so in the near future.

@myui
Copy link
Author

myui commented Jan 30, 2019

Backtracking make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants