[PAPER] New quant method with SOTA quality and speed: QTIP #668

TyraVex · 2024-11-01T20:35:57Z

Hello Turboderp,

I believe this could interest you, the paper sounds great. I believe exl2 has a very different approach on quantization, so I don't expect anything from this, simply to share some fresh ideas.

From https://www.reddit.com/r/LocalLLaMA/comments/1ggwrx6/new_quantization_method_qtip_quantization_with/:

New Quantization Method -- QTIP: Quantization with Trellises and Incoherence Processing
Resources

We're pleased to introduce QTIP, a new LLM quantization algorithm that uses trellis coded quantization and incoherence processing to achieve a state of the art combination of speed and quantization quality.

Paper (NeurIPS 2024 Spotlight): https://arxiv.org/pdf/2406.11235

Codebase + inference kernels: https://github.com/Cornell-RelaxML/qtip

Prequantized models (including 2 Bit 405B Instruct): https://huggingface.co/collections/relaxml/qtip-quantized-models-66fa253ad3186746f4b62803

QTIP has significantly better quality over QuIP# while being just as fast. QTIP is also on par with or better than PV-Tuning while being much faster (~2-3x).

I have looked for similar requests before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will make my requests politely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PAPER] New quant method with SOTA quality and speed: QTIP #668

[PAPER] New quant method with SOTA quality and speed: QTIP #668

TyraVex commented Nov 1, 2024 •

edited

Loading

[PAPER] New quant method with SOTA quality and speed: QTIP #668

[PAPER] New quant method with SOTA quality and speed: QTIP #668

Comments

TyraVex commented Nov 1, 2024 • edited Loading

TyraVex commented Nov 1, 2024 •

edited

Loading