Issues with the effectiveness of W4A16 quantization using AWQ #157

RanchiZhao · 2024-12-10T06:18:22Z

For quantizing the llm part of VILA, I would like to know why AWQ was chosen instead of GPTQ. Have you tried using GPTQ to quantize the LLM part? AWQ performs better?

gheinrich pushed a commit to gheinrich/VILA that referenced this issue Dec 16, 2024

Add ZigzagRing Support (NVlabs#157)

118c349

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with the effectiveness of W4A16 quantization using AWQ #157

Issues with the effectiveness of W4A16 quantization using AWQ #157

RanchiZhao commented Dec 10, 2024

Issues with the effectiveness of W4A16 quantization using AWQ #157

Issues with the effectiveness of W4A16 quantization using AWQ #157

Comments

RanchiZhao commented Dec 10, 2024