Prompt-Aware Scheduling for Low-Latency LLM Serving

This research introduces PARS, a novel scheduler leveraging margin ranking loss to approximate shortest-job-first strategies, significantly reducing latency ...

Level: advanced

By Unknown

Category: research