进行推理过程中无法吃满GPU #555
Closed
lostmaniac
started this conversation in
Bad Case
Replies: 1 comment 1 reply
-
两卡的效率本来就很低了,建议是单卡。另外,这是默认的hf加载方式,没有做任何推理优化,你可以使用vllm等推理优化框架尝试 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
环境说明
./openai_api.py
cuda监控:
调整过的文件:openai_api.py
生成过程中使用率始终为40%-55%左右的使用率
请问需要怎么调整能够吃满GPU,让速度更快一点。感谢
Beta Was this translation helpful? Give feedback.
All reactions