SourceScore
SourceScore VERITAS · verified claim100% confidence

HumanEval benchmark introduced in paper: Evaluating Large Language Models Trained on Code (Chen et al., 2021).

Subject
HumanEval benchmark
Predicate
introduced_in_paper
Object
Evaluating Large Language Models Trained on Code (Chen et al., 2021)
Primary source · preprint · 2021-07-07
Evaluating Large Language Models Trained on Code arXiv (Chen et al., OpenAI)
Last verified 2026-05-16 · 2 sources · 71ec42731d2c9e0cView full claim →