arXiv:2604.16475v1 Announce Type: new Abstract: Current Large Language Models (LLMs) are primarily based on large-scale dense matrix multiplications. Inspired by the brain's information processing mechanism, we explore the fundamental question: how to effectively integrate the brain's spiking-driven characteristics into LLM inference. Spiking Neural Networks (SNNs) possess spike-driven characteristics, and some works have attempted to combine SNNs with Transformers. However, achieving spike-driven LLMs with billions of parameters, relying solely on sparse additions, remains a challenge in the SNN field. To address the issues of limited representational capacity and sparsity in existing spike encoding schemes at the LLM level, we propose SDLLM, a spike-driven large language model that eliminates dense matrix multiplications through sparse addition operations. Specifically, we use the plug-and-play gamma-SQP two-step spike encoding method to ensure that the quantization process aligns wi...
Want to discover more AI signals like this?
Explore Steek