TeLLMe v2: An Efficient End-to-End Ternary LLM Prefill and Decode Accelerator with Table-Lookup Matmul on Edge FPGAs

This research details TeLLMe v2, an FPGA-based accelerator utilizing ternary-weighted matrix multiplication to achieve high-efficiency LLM inference under st...

Level: advanced

By Ye Qiao and 4 other authors

Category: research