Explore RLP, a novel reinforcement learning pretraining objective that integrates chain-of-thought reasoning to significantly boost model performance on comp...
Level: advanced
By Unknown
Category: research