This research establishes memory bandwidth and interconnect latency as the dominant constraints in LLM inference, proposing advanced architectural solutions ...
Level: advanced
By Xiaoyu Ma, David Patterson
Category: research