Matching Accuracy, Different Geometry: Evolution Strategies vs GRPO in LLM Post-Training

This research compares Evolution Strategies and GRPO for LLM post-training, revealing how gradient-free methods achieve similar accuracy through distinct geo...

Level: advanced

By William Hoy

Category: research