Tag

deepseekmath

1 verified claim carrying this tag. Each has 2+ primary sources and an HMAC-SHA256 signature.

Group Relative Policy Optimization (GRPO) introduced in paper: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Shao et al., 2024).
f73e50d63643df21 · 3 sources · 92% confidence

Related tags

20241 reasoning1 rlhf1 deepseek1 reinforcement-learning1 group-relative-policy-optimization1 grpo1 shao1