This research explores semantic routing in vLLM to dynamically apply reasoning only when necessary, significantly improving accuracy while reducing latency a...
Level: advanced
By Chen Wang and 6 other authors
Category: research