Skip to content

DPO training make model even worse #394

Closed Answered by rasbt
jingedawang asked this question in Q&A
Discussion options

You must be logged in to vote

Yeah, in my experience, DPO can be very tricky and finicky. Even though the DPO loss improves, it can make the model worse (it's also susceptible to collapse). I remember a bunch of papers discussing that. I think one of them was this one, which may be helpful here: https://arxiv.org/abs/2402.13228

Replies: 2 comments 10 replies

Comment options

You must be logged in to vote
7 replies
@rasbt
Comment options

Answer selected by jingedawang
@jingedawang
Comment options

@jingedawang
Comment options

@rasbt
Comment options

@jingedawang
Comment options

Comment options

You must be logged in to vote
3 replies
@rasbt
Comment options

@jingedawang
Comment options

@rasbt
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants