You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe this could interest you, the paper sounds great. I believe exl2 has a very different approach on quantization, so I don't expect anything from this, simply to share some fresh ideas.
New Quantization Method -- QTIP: Quantization with Trellises and Incoherence Processing
Resources
We're pleased to introduce QTIP, a new LLM quantization algorithm that uses trellis coded quantization and incoherence processing to achieve a state of the art combination of speed and quantization quality.
QTIP has significantly better quality over QuIP# while being just as fast. QTIP is also on par with or better than PV-Tuning while being much faster (~2-3x).
I have looked for similar requests before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will make my requests politely.
The text was updated successfully, but these errors were encountered:
Hello Turboderp,
I believe this could interest you, the paper sounds great. I believe exl2 has a very different approach on quantization, so I don't expect anything from this, simply to share some fresh ideas.
From https://www.reddit.com/r/LocalLLaMA/comments/1ggwrx6/new_quantization_method_qtip_quantization_with/:
New Quantization Method -- QTIP: Quantization with Trellises and Incoherence Processing
Resources
We're pleased to introduce QTIP, a new LLM quantization algorithm that uses trellis coded quantization and incoherence processing to achieve a state of the art combination of speed and quantization quality.
Paper (NeurIPS 2024 Spotlight): https://arxiv.org/pdf/2406.11235
Codebase + inference kernels: https://github.com/Cornell-RelaxML/qtip
Prequantized models (including 2 Bit 405B Instruct): https://huggingface.co/collections/relaxml/qtip-quantized-models-66fa253ad3186746f4b62803
QTIP has significantly better quality over QuIP# while being just as fast. QTIP is also on par with or better than PV-Tuning while being much faster (~2-3x).
The text was updated successfully, but these errors were encountered: