Paper with Code: You can now run LLMs without Matrix Multiplications
Saw this paper: In essence, MatMul operations can be completely eliminated from LLMs while maintaining strong performance at billion-parameter scales and by utilising an optimised kernel during inference, their model’s memory consumption can be reduced by more than 10× compared to un-optimised models. source:
Implementation for MatMul-free LM. Contribute to ridgerchu/matmulfreellm development by creating an account on GitHub.
https://github.com/ridgerchu/matmulfreellm
Discover More
Curated from across
Data Scientists on
by Kendall Vernon
Goldman Sachs
AI discovers a Faster Matrix Multiplication Algorithm
Google Deep Mind
https://www.nature.com/articles/s41586-022-05172-4.pdf