Question about "dist_entropy" when updating ppo #3

quintus0505 · 2021-09-10T07:04:31Z

Hi,
I am reading your codes and have problem in evaluate_actions when updating ppo:

https://github.com/electronicarts/character-motion-vaes/blob/main/algorithms/ppo.py#L95

I notice that you get dist_entropy along with action and value loss, which function in backward propagation. Though dist_entropy doesn't work in your code since the entropy_coef currently in your code is 0 as default, I am still curious about how it functions and why you use this (What exactly "An ugly hack for my KFAC implementation." is 😝)

Thanks

The text was updated successfully, but these errors were encountered:

belinghy · 2021-09-21T19:15:09Z

The dist_entropy term is described in the original PPO paper (https://arxiv.org/pdf/1707.06347.pdf); see Equation 9. The trick has been used in earlier papers as well. The purpose is to encourage exploration.

We set entropy_coef to 0 because it's already enough to solve the task. But if the policy gets stuck in a local minimum, increasing this term might help to find a better solution.

quintus0505 · 2021-09-28T04:08:34Z

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about "dist_entropy" when updating ppo #3

Question about "dist_entropy" when updating ppo #3

quintus0505 commented Sep 10, 2021

belinghy commented Sep 21, 2021

quintus0505 commented Sep 28, 2021

Question about "dist_entropy" when updating ppo #3

Question about "dist_entropy" when updating ppo #3

Comments

quintus0505 commented Sep 10, 2021

belinghy commented Sep 21, 2021

quintus0505 commented Sep 28, 2021