Tag: reinforcement-learning

All the articles with the tag "reinforcement-learning".

From GRPO to ExGRPO: Learning to Reason from Experience

9 Oct, 2025

A compact reproduction of ExGRPO, showing how reusing confident past reasoning trajectories can make reinforcement learning more stable and sample-efficient than standard GRPO.