SourceScore
SourceScore VERITAS · verified claim100% confidence

Anthropic Constitutional Classifiers publicly released on: 2025-02-04 by Anthropic — safeguard against jailbreaks via constitutional-trained input/output filters.

Subject
Anthropic Constitutional Classifiers
Predicate
publicly_released_on
Object
2025-02-04 by Anthropic — safeguard against jailbreaks via constitutional-trained input/output filters
Primary source · official blog · 2025-02-04
Constitutional Classifiers: Defending against universal jailbreaks Anthropic
Last verified 2026-05-16 · 2 sources · 688a84a8d7211fc0View full claim →