Test-Time Scaling Makes Overtraining Compute-Optimal

This research introduces Train-to-Test scaling laws, revealing that heavily overtrained models significantly outperform standard approaches when inference co...

Level: advanced

By Nicholas Roberts

Category: research