DeepCFR替换grp的可行性 #70
CHANGNIANLE
started this conversation in
Ideas
Replies: 2 comments 1 reply
-
有趣的想法。reward 设计一直是一个难题,现在 GRP 这样的 reward shaping 也算不上很精准。 |
Beta Was this translation helpful? Give feedback.
0 replies
-
请问是否有初步的测试数据?看了一下这篇文章,LuckyJ可能也使用了类似的方法link |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
实现步骤:
1.修改训练数据,使其每次获取一整局所有回合的游戏操作.
2.利用每次操作的返回的mask_bits和q_values,以及对局结果,就能构建一个CFR博弈树.
3.通过CFR算法计算出博弈树每个节点更新后的q_values,与原始q_values 比较...拟合它..
Mortal 4.0就像是为了DeepCFR设计的一样.....
目前只提出想法用于替换grp....
可行性未知...
遇到的问题..
1.(已解决)当enable_quick_eval=True 时,reach的dahai没有meta结构....导致CFR博弈树不够精确..
修改 mortal.rs->fn set_scene( 210 行)
添加一个 !cans.can_reach_accepted
在update.rs fn reach和 fn reach_accepted 的if actor_rel == 0 {}中,分别添加 self.last_cans.can_reach_accepted = true;和self.last_cans.can_reach_accepted = false;
Beta Was this translation helpful? Give feedback.
All reactions