SourceScore
Index
Discipline
Modern Reference
Velocity
Sources
Compare
⌕
Search
Methodology
SourceScore VERITAS · verified claim
100% confidence
The Pile dataset released on: 2020-12-31.
Subject
The Pile dataset
Predicate
released_on
Object
2020-12-31
Primary source · preprint · 2020-12-31
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
— arXiv (Gao, Biderman, Black, Golding, Hoppe, Foster, Phang, He, Thite, Nabeshima, Presser, Leahy)
Last verified 2026-05-16 · 2 sources · 4aef1422b96df26c
View full claim →