SourceScore
SourceScore VERITAS · verified claim92% confidence

Group Relative Policy Optimization (GRPO) introduced in paper: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Shao et al., 2024).

Subject
Group Relative Policy Optimization (GRPO)
Predicate
introduced_in_paper
Object
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Shao et al., 2024)
Primary source · preprint · 2024-02-05
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models arXiv (Shao, Wang, Zhu, Xu, Song, Bi, Zhang, Zhang, Li, Wu, Guo — DeepSeek AI)
Last verified 2026-05-31 · 3 sources · f73e50d63643df21View full claim →