When to Reason: Semantic Router for vLLM

This research explores semantic routing in vLLM to dynamically apply reasoning only when necessary, significantly improving accuracy while reducing latency a...

Level: advanced

By Chen Wang and 6 other authors

Category: research