You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you so much for sharing the data. It's very helpful for the RLHF community!
I found some hyper-parameters for training UltraCM in your paper, but I am also confused by the following questions:
How do you prepare the training examples? It seems that the instruction, completion, the feedback, and the overall score are filled into the ultracm_instruction_template as defined in your demo page. But I'm not sure...
How is the loss calculated? Did you apply masking to the input content, including the instruction and completion?
Did you compare tuning a critique model from an SFT model versus a pretrained checkpoint?
Thanks again for your efforts!
The text was updated successfully, but these errors were encountered:
Thank you so much for sharing the data. It's very helpful for the RLHF community!
I found some hyper-parameters for training UltraCM in your paper, but I am also confused by the following questions:
ultracm_instruction_template
as defined in your demo page. But I'm not sure...Thanks again for your efforts!
The text was updated successfully, but these errors were encountered: