SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization

This research investigates how data overlap between SFT and GRPO stages impacts Lean 4 autoformalization, revealing that disjoint datasets significantly boos...

Level: advanced

By Xiaole Su

Category: research