Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Benchmarking update - phase 1 #339

Merged
merged 12 commits into from
Jun 28, 2024
Merged

Benchmarking update - phase 1 #339

merged 12 commits into from
Jun 28, 2024

Conversation

dbarbuzzi
Copy link

This PR updates the benchmarking performed in remote-push and nightly runs according to the first set of deliverables from our recent meeting:

  • Only the benchmark_serving.json config is run
    • This is accomplished with a new list, nm_benchmark_base_config_list.txt, other lists are untouched
  • The benchmark_serving.json has various reductions:
    • Model list reduced to facebook/opt-350m and meta-llama/Meta-Llama-3-8B-Instruct
    • nr-qps list reduced to 300,1
    • Metric tracking reduced to mean TPOT and mean TTFT (other metrics still recorded/logged per usual)

There is also a small fix related to server startup (changing from localhost to 127.0.0.1 because localhost on the machines is mapped to the IPv6 ::1 which something in the server stack doesn’t seem to like).

In a commit prior to opening the PR with all functional changes, the full benchmark job took <30 min:
https://github.com/neuralmagic/nm-vllm/actions/runs/9669361155/job/26709082658

Copy link
Member

@andy-neuma andy-neuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@dbarbuzzi dbarbuzzi merged commit 569c905 into main Jun 28, 2024
28 checks passed
@dbarbuzzi dbarbuzzi deleted the benchmarking-update branch June 28, 2024 20:39
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants