Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three Browsers

This research dissects the hidden costs of WebGPU dispatch overhead in LLM inference, revealing how naive benchmarks mislead and why backend selection dictat...

Level: advanced

By Jędrzej Maczan

Category: research