-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] big TPOT and ITL when running the offline benchmark #2097
Comments
Hi @TraceIvan Regarding the issues you mentioned, there are three main points:
BTW I can share with you my offline throughput comparison of Llama 3.1 70B Instruct on H100 TP4 for your reference (since I currently don't have A100 or A800). From the results, throughput nearly doubles (8893.63 vs 4036.33). If you have any questions, feel free to communicate with me at any time. Cheers!
|
SGLang test can try to open enable_mix_chunked and enable_overlap_schedule, in my current test SGLang performance and LMDeploy close, better than vLLM |
Thank you for your suggestion. I will try to use the versions of vllm and sglang mentioned in the previous blog post and increase the request rate for the online benchmark. |
Checklist
Describe the bug
I am trying to compare vLLM and SGLang, and found that in the case of offline benchmarks, SGLang is significantly higher than vLLM in TPOT and ITL. In addition, when conducting online benchmarks, SGLang is only lower than vLLM in TTFT, and the gap in other metrics is not obvious, which is different from the official test results.
Reproduction
offline benchmark for sglang:
offline benchmark for vllm:
online benchmark for sglang:
online benchmark for vllm:
Environment
• python:3.10.15
• torch:2.4.0-cu121
• vLLM:0.6.3.post1
• SGLang:0.3.5.post2
• GPU:1×NVIDIA A800 80GB
The text was updated successfully, but these errors were encountered: