Group Relative Policy Optimization (GRPO) introduced in paper: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Shao et al., 2024).
Subject
Group Relative Policy Optimization (GRPO)
Predicate
introduced_in_paper
Object
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Shao et al., 2024)
Primary source · preprint · 2024-02-05
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models — arXiv (Shao, Wang, Zhu, Xu, Song, Bi, Zhang, Zhang, Li, Wu, Guo — DeepSeek AI)