You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I notice that you get dist_entropy along with action and value loss, which function in backward propagation. Though dist_entropy doesn't work in your code since the entropy_coef currently in your code is 0 as default, I am still curious about how it functions and why you use this (What exactly "An ugly hack for my KFAC implementation." is 😝)
Thanks
The text was updated successfully, but these errors were encountered:
The dist_entropy term is described in the original PPO paper (https://arxiv.org/pdf/1707.06347.pdf); see Equation 9. The trick has been used in earlier papers as well. The purpose is to encourage exploration.
We set entropy_coef to 0 because it's already enough to solve the task. But if the policy gets stuck in a local minimum, increasing this term might help to find a better solution.
Hi,
I am reading your codes and have problem in
evaluate_actions
when updating ppo:I notice that you get
dist_entropy
along with action and value loss, which function in backward propagation. Thoughdist_entropy
doesn't work in your code since theentropy_coef
currently in your code is 0 as default, I am still curious about how it functions and why you use this (What exactly "An ugly hack for my KFAC implementation." is 😝)Thanks
The text was updated successfully, but these errors were encountered: