Tag
1 verified claim carrying this tag. Each has 2+ primary sources and an HMAC-SHA256 signature.
Group Relative Policy Optimization (GRPO) introduced in paper: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Shao et al., 2024).
f73e50d63643df21 · 3 sources · 92% confidence