Tag
dataset
5 verified claims carrying this tag. Each has 2+ primary sources and an HMAC-SHA256 signature.
ImageNet dataset introduced in paper: ImageNet: A Large-Scale Hierarchical Image Database (Deng et al., 2009).
045e628def62181d · 2 sources · 100% confidence
Common Crawl founded in: 2007.
4a2689e6230ef2e1 · 2 sources · 95% confidence
C4 (Colossal Clean Crawled Corpus) introduced in paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2019).
0d24c97977ebd744 · 2 sources · 100% confidence
The Pile dataset released on: 2020-12-31.
4aef1422b96df26c · 2 sources · 100% confidence
RedPajama dataset released on: 2023-04-17.
ea8b7be3a49101be · 2 sources · 95% confidence