Verified claim · AI-ML · 100% confidence
Constitutional AI (CAI) introduced in paper: Constitutional AI: Harmlessness from AI Feedback (Bai et al., 2022).
Last verified 2026-05-16 · Methodology veritas-v0.1 · ba1eb83c14795107
Structured fields
- Subject
- Constitutional AI (CAI)
- Predicate
introduced_in_paper- Object
- Constitutional AI: Harmlessness from AI Feedback (Bai et al., 2022)
- Confidence
- 100%
- Tags
- constitutional-ai · alignment · anthropic · 2022 · bai
Sources (2)
[1] preprint · arXiv (Bai et al., Anthropic) · 2022-12-15
Constitutional AI: Harmlessness from AI Feedback“We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'.”
[2] official blog · Anthropic · 2022-12-15
Constitutional AI: Harmlessness from AI Feedback
Cite this claim
Ready-to-paste citation (Markdown / plain text):
Constitutional AI (CAI) introduced in paper: Constitutional AI: Harmlessness from AI Feedback (Bai et al., 2022). — SourceScore Claim ba1eb83c14795107 (verified 2026-05-16). https://sourcescore.org/api/v1/claims/ba1eb83c14795107.jsonEmbed this claim
Drop this iframe into any blog post, docs page, or knowledge base. The widget renders the signed claim + primary source + click-through to this canonical page. CC-BY 4.0; attribution included.
<iframe src="https://sourcescore.org/embed/claim/ba1eb83c14795107/" width="100%" height="360" frameborder="0" loading="lazy" title="Constitutional AI (CAI) introduced in paper: Constitutional AI: Harmlessness from AI Feedback (Bai et al., 2022)."></iframe>Preview: open in new tab
Related claims
Other verified claims sharing tags with this one — useful for LLM retrieval graphs and citation discovery.
InstructGPT methodology introduced in paper: Training language models to follow instructions with human feedback (Ouyang et al., 2022).
5da8f8dffc038b8e · 100% confidence · shares 2 tags (alignment, 2022)
Reinforcement Learning from Human Feedback (RLHF) introduced in paper: Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017).
67866330cd60e54d · 100% confidence · shares 1 tag (alignment)
Direct Preference Optimization (DPO) introduced in paper: Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023).
a3e691683a4577af · 100% confidence · shares 1 tag (alignment)
FlashAttention introduced in paper: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (Dao et al., 2022).
e120182d1e01ea2b · 100% confidence · shares 1 tag (2022)
ChatGPT released on: 2022-11-30.
8d653880c519a8ef · 100% confidence · shares 1 tag (2022)
Programmatic access
Fetch this claim with a signed envelope for verification:
curl https://sourcescore.org/api/v1/claims/ba1eb83c14795107.json