This research compares Evolution Strategies and GRPO for LLM post-training, revealing how gradient-free methods achieve similar accuracy through distinct geo...
Level: advanced
By William Hoy
Category: research