Direct Preference Optimization (DPO) introduced in paper: Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023).
Subject
Direct Preference Optimization (DPO)
Predicate
introduced_in_paper
Object
Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023)
Primary source · preprint · 2023-05-29
Direct Preference Optimization: Your Language Model is Secretly a Reward Model — arXiv (Rafailov, Sharma, Mitchell, Ermon, Manning, Finn)