Hard Examples Are All You Need: Maximizing GRPO Post-Training Under Annotation Budgets

Discover how prioritizing hard examples in GRPO post-training can yield up to 47% performance gains while optimizing annotation budgets. This research offers...

Level: advanced

By Unknown

Category: research