Efficient Training-Free Online Routing for High-Volume Multi-LLM Serving

This research introduces a training-free online routing algorithm leveraging ANN search to optimize high-volume multi-LLM serving with asymptotic optimality ...

Level: advanced

By Fangzhou Wu, Sandeep Silwal

Category: research