Anthropic Constitutional Classifiers publicly released on: 2025-02-04 by Anthropic — safeguard against jailbreaks via constitutional-trained input/output filters.
Subject
Anthropic Constitutional Classifiers
Predicate
publicly_released_on
Object
2025-02-04 by Anthropic — safeguard against jailbreaks via constitutional-trained input/output filters
Primary source · official blog · 2025-02-04
Constitutional Classifiers: Defending against universal jailbreaks — Anthropic