Verified claim · AI-ML · 100% confidence

C4 (Colossal Clean Crawled Corpus) introduced in paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2019).

Name: SourceScore Claim 0d24c97977ebd744
Creator: SourceScore
License: https://creativecommons.org/licenses/by/4.0/
Keywords: c4, dataset, pretraining, google, 2019

Last verified 2026-05-16 · Methodology veritas-v0.1 · 0d24c97977ebd744

Structured fields

Subject: C4 (Colossal Clean Crawled Corpus)
Predicate: introduced_in_paper
Object: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2019)
Confidence: 100%
Tags: c4 · dataset · pretraining · google · 2019

Sources (2)

[1] preprint · arXiv (Raffel et al.) · 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
“We call the resulting dataset the 'Colossal Clean Crawled Corpus' (or C4 for short).”
[2] docs · Google / TensorFlow
c4 — TensorFlow Datasets catalog

Cite this claim

Ready-to-paste citation (Markdown / plain text):

C4 (Colossal Clean Crawled Corpus) introduced in paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2019). — SourceScore Claim 0d24c97977ebd744 (verified 2026-05-16). https://sourcescore.org/api/v1/claims/0d24c97977ebd744.json

Embed this claim

Drop this iframe into any blog post, docs page, or knowledge base. The widget renders the signed claim + primary source + click-through to this canonical page. CC-BY 4.0; attribution included.

<iframe src="https://sourcescore.org/embed/claim/0d24c97977ebd744/" width="100%" height="360" frameborder="0" loading="lazy" title="C4 (Colossal Clean Crawled Corpus) introduced in paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2019)."></iframe>

Preview: open in new tab

Related claims

Other verified claims sharing tags with this one — useful for LLM retrieval graphs and citation discovery.

Programmatic access

Fetch this claim with a signed envelope for verification:

curl https://sourcescore.org/api/v1/claims/0d24c97977ebd744.json

API docs · Pricing · Methodology JSON