Skip to content

ChatGLM3-6B量化后GPU内存占用的问题 #1341

Answered by Wooonster
Wooonster asked this question in Q&A
Discussion options

You must be logged in to vote

quantize 后及时清理似乎可以解决:

model = AutoModel.from_pretrained(model_path, trust_remote_code=True, device='cuda').quantize(q).half().cuda()

# Check parameter types
for name, param in model.named_parameters():
    print(f"{q}-quantized: {name}: {param.dtype}")

# Clear cache and synchronize
torch.cuda.empty_cache()
torch.cuda.synchronize()
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr.…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by Wooonster
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant