Group Relative Policy Optimization (GRPO) introduced in paper: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Shao et al., 2024).

SourceScore VERITAS · verified claim92% confidence

Subject

Group Relative Policy Optimization (GRPO)

Predicate

introduced_in_paper

Object

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Shao et al., 2024)

Primary source · preprint · 2024-02-05

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models — arXiv (Shao, Wang, Zhu, Xu, Song, Bi, Zhang, Zhang, Li, Wu, Guo — DeepSeek AI)

Last verified 2026-05-31 · 3 sources · f73e50d63643df21View full claim →