TeLLMe v2: An Efficient End-to-End Ternary LLM Prefill and Decode Accelerator with Table-Lookup Matmul on Edge FPGAs
This research details TeLLMe v2, an FPGA-based accelerator utilizing ternary-weighted matrix multiplication to achieve high-efficiency LLM inference under st...