SourceScore
SourceScore VERITAS · verified claim100% confidence

The Pile dataset released on: 2020-12-31.

Subject
The Pile dataset
Predicate
released_on
Object
2020-12-31
Primary source · preprint · 2020-12-31
The Pile: An 800GB Dataset of Diverse Text for Language Modeling arXiv (Gao, Biderman, Black, Golding, Hoppe, Foster, Phang, He, Thite, Nabeshima, Presser, Leahy)
Last verified 2026-05-16 · 2 sources · 4aef1422b96df26cView full claim →