Theoretical Modeling of LLM Self-Improvement Training Dynamics Through Solver-Verifier Gap
This research explores the theoretical dynamics of LLM self-improvement through the solver-verifier gap, demonstrating how external data can optimize trainin...