SourceScore

Verified claim · AI-ML · 100% confidence

MMLU benchmark introduced in paper: Measuring Massive Multitask Language Understanding (Hendrycks et al., 2020).

Last verified 2026-05-16 · Methodology veritas-v0.1 · 428d754e7c651be6

Structured fields

Subject
MMLU benchmark
Predicate
introduced_in_paper
Object
Measuring Massive Multitask Language Understanding (Hendrycks et al., 2020)
Confidence
100%
Tags
mmlu · benchmark · hendrycks · 2020 · iclr · evaluation

Sources (2)

  1. [1] preprint · arXiv (Hendrycks et al.) · 2020-09-07

    Measuring Massive Multitask Language Understanding
    We propose a new test to measure a text model's multitask accuracy. The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more.
  2. [2] peer reviewed · OpenReview / ICLR · 2021-05-04

    Measuring Massive Multitask Language Understanding (ICLR 2021)

Cite this claim

Ready-to-paste citation (Markdown / plain text):

MMLU benchmark introduced in paper: Measuring Massive Multitask Language Understanding (Hendrycks et al., 2020). — SourceScore Claim 428d754e7c651be6 (verified 2026-05-16). https://sourcescore.org/api/v1/claims/428d754e7c651be6.json

Embed this claim

Drop this iframe into any blog post, docs page, or knowledge base. The widget renders the signed claim + primary source + click-through to this canonical page. CC-BY 4.0; attribution included.

<iframe src="https://sourcescore.org/embed/claim/428d754e7c651be6/" width="100%" height="360" frameborder="0" loading="lazy" title="MMLU benchmark introduced in paper: Measuring Massive Multitask Language Understanding (Hendrycks et al., 2020)."></iframe>

Preview: open in new tab

Related claims

Other verified claims sharing tags with this one — useful for LLM retrieval graphs and citation discovery.

Programmatic access

Fetch this claim with a signed envelope for verification:

curl https://sourcescore.org/api/v1/claims/428d754e7c651be6.json

API docs · Pricing · Methodology JSON