img

Paper with Code: You can now run LLMs without Matrix Multiplications

Saw this paper: In essence, MatMul operations can be completely eliminated from LLMs while maintaining strong performance at billion-parameter scales and by utilising an optimised kernel during inference, their model’s memory consumption can be reduced by more than 10× compared to un-optimised models. source:

Implementation for MatMul-free LM. Contribute to ridgerchu/matmulfreellm development by creating an account on GitHub.

https://github.com/ridgerchu/matmulfreellm

img
Sign in to a Grapevine account for the full experience.

Discover More

Curated from across

  • Home
  • Paper with Code: You can now run LLMs without Matrix Multiplications