Explore Uncertainty-Guided Checkpoint Selection, a novel method for optimizing LLM reinforcement finetuning by leveraging per-sample uncertainty to identify ...
Level: advanced
By Manh Nguyen, Dung Nguyen, Dai Do, Svetha Venkatesh, Hung Le
Category: research