This research investigates how Multi-Token Prediction enhances Transformer reasoning by inducing a reverse reasoning process and leveraging gradient decoupli...
Level: advanced
By Jianhao Huang
Category: research