-
感谢ChatGLM团队的工作,本人小白,学到很多。 我在使用ChatGLM3-6B模型时,对模型进行4/8bit量化的操作,代码如下:
理论上,量化后GPU的内存占用应该可以有所下降🤓,但是在查看GPU后 (by
可以看到不管 请问这是为什么呢 非常感谢佬们的解答🙏🙏🙏 |
Beta Was this translation helpful? Give feedback.
Answered by
Wooonster
Dec 20, 2024
Replies: 1 comment
-
quantize 后及时清理似乎可以解决:
|
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
Wooonster
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
quantize 后及时清理似乎可以解决: