SourceScore

Verified claim · AI-ML · 100% confidence

Direct Preference Optimization (DPO) introduced in paper: Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023).

Last verified 2026-05-16 · Methodology veritas-v0.1 · a3e691683a4577af

Structured fields

Subject
Direct Preference Optimization (DPO)
Predicate
introduced_in_paper
Object
Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023)
Confidence
100%
Tags
dpo · alignment · foundational · rafailov · 2023 · nips · stanford

Sources (2)

  1. [1] preprint · arXiv (Rafailov, Sharma, Mitchell, Ermon, Manning, Finn) · 2023-05-29

    Direct Preference Optimization: Your Language Model is Secretly a Reward Model
    In this paper, we introduce a new parameterization of the reward model in RLHF that enables extraction of the corresponding optimal policy in closed form, allowing us to solve the standard RLHF problem with only a simple classification loss.
  2. [2] peer reviewed · NeurIPS Foundation · 2023-12-10

    Direct Preference Optimization (NeurIPS 2023 proceedings)

Cite this claim

Ready-to-paste citation (Markdown / plain text):

Direct Preference Optimization (DPO) introduced in paper: Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023). — SourceScore Claim a3e691683a4577af (verified 2026-05-16). https://sourcescore.org/api/v1/claims/a3e691683a4577af.json

Embed this claim

Drop this iframe into any blog post, docs page, or knowledge base. The widget renders the signed claim + primary source + click-through to this canonical page. CC-BY 4.0; attribution included.

<iframe src="https://sourcescore.org/embed/claim/a3e691683a4577af/" width="100%" height="360" frameborder="0" loading="lazy" title="Direct Preference Optimization (DPO) introduced in paper: Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023)."></iframe>

Preview: open in new tab

Related claims

Other verified claims sharing tags with this one — useful for LLM retrieval graphs and citation discovery.

Programmatic access

Fetch this claim with a signed envelope for verification:

curl https://sourcescore.org/api/v1/claims/a3e691683a4577af.json

API docs · Pricing · Methodology JSON