Neuralink: Fast LLM Inference on Smartphones with Neuron Co-Activation Linking

This research introduces Neuralink, a novel approach leveraging sparsity and flash memory co-design to accelerate LLM inference on smartphones with 1.49x low...

Level: advanced

By Unknown

Category: research