Replies: 1 comment 1 reply
-
Interesting stuff! I don't think I can implement this approach - it seems quite complicated. But I got inspired by this idea and started implementing an n-bit quantisation + matrix multiplication in |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi!
Great work with this project, love it!
As this will probably always be running on a CPU, the project made me recall a paper from a year or so back, on fast approximate matrix multiplication on CPU. It can deliver speed ups 100x the speed of exact matrix multiplication:
https://arxiv.org/abs/2106.10860
There is a C++ implementation here:
https://github.com/dblalock/bolt
Maybe this would get things to GPU speeds on CPU!
Beta Was this translation helpful? Give feedback.
All reactions