A Unified Understanding of Offline Data Selection and Online Self-refining Generation for Post-training LLMs
This research introduces a bilevel optimization framework for offline data selection and online self-refining generation to enhance LLM post-training. It det...