量化工具来自 https://github.com/casper-hansen/AutoAWQ 和 https://github.com/IST-DASLab/marlin
仅适用sm_80以上的显卡(30系列及以上)
需要先安装marlin依赖
pip install git+https://github.com/IST-DASLab/marlin
模型下载地址:
在examples文件夹可以找到一个简单的使用示例
The quantization tools come from https://github.com/casper-hansen/AutoAWQ and https://github.com/IST-DASLab/marlin
Only suitable for GPUs above sm_80 (30 series and above)
You need to install the marlin dependency first
pip install git+https://github.com/IST-DASLab/marlin
Model download link:
You can find a simple example in the examples folder