Tag
1 verified claim carrying this tag. Each has 2+ primary sources and an HMAC-SHA256 signature.
Direct Preference Optimization (DPO) introduced in paper: Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023).
a3e691683a4577af · 2 sources · 100% confidence