Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three Browsers
This research dissects the hidden costs of WebGPU dispatch overhead in LLM inference, revealing how naive benchmarks mislead and why backend selection dictat...