Tag
evaluation
3 verified claims carrying this tag. Each has 2+ primary sources and an HMAC-SHA256 signature.
MMLU benchmark introduced in paper: Measuring Massive Multitask Language Understanding (Hendrycks et al., 2020).
428d754e7c651be6 · 2 sources · 100% confidence
SuperGLUE benchmark introduced in paper: SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems (Wang et al., 2019).
1a1e87145608c91a · 2 sources · 100% confidence
GLUE benchmark introduced in paper: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding (Wang et al., 2018).
aa113b5e61d5c214 · 2 sources · 100% confidence