Explore Online Supervised Fine-Tuning (OSFT), a novel reward-free protocol that leverages latent pretraining preferences to enhance LLM reasoning. This resea...
Level: advanced
By Mengqi Li, Lei Zhao, Anthony Man-Cho So, Ruoyu Sun, Xiao Li
Category: research