Tag: reinforcement-learning
All the articles with the tag "reinforcement-learning".
-
From GRPO to ExGRPO: Learning to Reason from Experience
A compact reproduction of ExGRPO, showing how reusing confident past reasoning trajectories can make reinforcement learning more stable and sample-efficient than standard GRPO.