Posts
All the articles I've posted.
-
From GRPO to ExGRPO: Learning to Reason from Experience
A compact reproduction of ExGRPO, showing how reusing confident past reasoning trajectories can make reinforcement learning more stable and sample-efficient than standard GRPO.