ETR: Entropy Trend Reward for Efficient Chain-of-Thought Reasoning

This research introduces Entropy Trend Reward (ETR), a novel trajectory-aware objective that optimizes Chain-of-Thought reasoning by managing uncertainty tre...

Level: advanced

By Xuan Xiong, Huan Liu, Li Gu, Zhixiang Chi, Yue Qiu, Yuanhao Yu, Yang Wang

Category: research