Posts

All the articles I've posted.

From GRPO to ExGRPO: Learning to Reason from Experience

9 Oct, 2025

A compact reproduction of ExGRPO, showing how reusing confident past reasoning trajectories can make reinforcement learning more stable and sample-efficient than standard GRPO.