Replies: 1 comment
-
我觉得是sft 做完后在做RLHF的 理论上是这样,也希望博主也能开源一下RM和PPO的代码呀 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
将chinese-AIpaca的STF替换成RLHF+PPO来进行指令微调会得到更好的效果吗?
Beta Was this translation helpful? Give feedback.
All reactions