SourceScore
SourceScore VERITAS · verified claim100% confidence

Direct Preference Optimization (DPO) introduced in paper: Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023).

Subject
Direct Preference Optimization (DPO)
Predicate
introduced_in_paper
Object
Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023)
Primary source · preprint · 2023-05-29
Direct Preference Optimization: Your Language Model is Secretly a Reward Model arXiv (Rafailov, Sharma, Mitchell, Ermon, Manning, Finn)
Last verified 2026-05-16 · 2 sources · a3e691683a4577afView full claim →