Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PAPER] New quant method with SOTA quality and speed: QTIP #668

Open
3 tasks done
TyraVex opened this issue Nov 1, 2024 · 0 comments
Open
3 tasks done

[PAPER] New quant method with SOTA quality and speed: QTIP #668

TyraVex opened this issue Nov 1, 2024 · 0 comments

Comments

@TyraVex
Copy link

TyraVex commented Nov 1, 2024

Hello Turboderp,

I believe this could interest you, the paper sounds great. I believe exl2 has a very different approach on quantization, so I don't expect anything from this, simply to share some fresh ideas.

From https://www.reddit.com/r/LocalLLaMA/comments/1ggwrx6/new_quantization_method_qtip_quantization_with/:

New Quantization Method -- QTIP: Quantization with Trellises and Incoherence Processing
Resources

We're pleased to introduce QTIP, a new LLM quantization algorithm that uses trellis coded quantization and incoherence processing to achieve a state of the art combination of speed and quantization quality.

Paper (NeurIPS 2024 Spotlight): https://arxiv.org/pdf/2406.11235

Codebase + inference kernels: https://github.com/Cornell-RelaxML/qtip

Prequantized models (including 2 Bit 405B Instruct): https://huggingface.co/collections/relaxml/qtip-quantized-models-66fa253ad3186746f4b62803

QTIP has significantly better quality over QuIP# while being just as fast. QTIP is also on par with or better than PV-Tuning while being much faster (~2-3x).

  • I have looked for similar requests before submitting this one.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will make my requests politely.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant