RLP: Reinforcement as a Pretraining Objective

Explore RLP, a novel reinforcement learning pretraining objective that integrates chain-of-thought reasoning to significantly boost model performance on comp...

Level: advanced

By Unknown

Category: research