Skip to content

Commit

Permalink
[ci][distributed] add tests for custom allreduce (vllm-project#5689)
Browse files Browse the repository at this point in the history
  • Loading branch information
youkaichao authored Jun 19, 2024
1 parent afed90a commit d571ca0
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 5 deletions.
8 changes: 6 additions & 2 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,11 @@ steps:
- pip install -r requirements-docs.txt
- SPHINXOPTS=\"-W\" make html

- label: A100 status
- label: Distributed Tests (A100)
gpu: a100
commands:
- nvidia-smi
# NOTE: don't test llama model here, it seems hf implementation is buggy
# see https://github.com/vllm-project/vllm/pull/5689 for details
- pytest -v -s distributed/test_custom_all_reduce.py
- TEST_DIST_MODEL=facebook/opt-125m DISTRIBUTED_EXECUTOR_BACKEND=ray pytest -v -s distributed/test_basic_distributed_correctness.py
- TEST_DIST_MODEL=facebook/opt-125m DISTRIBUTED_EXECUTOR_BACKEND=mp pytest -v -s distributed/test_basic_distributed_correctness.py
7 changes: 4 additions & 3 deletions tests/distributed/test_custom_all_reduce.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@
from vllm.distributed.parallel_state import (get_tensor_model_parallel_group,
get_tp_group, graph_capture)

from ..utils import (init_test_distributed_environment,
from ..utils import (ensure_model_parallel_initialized,
init_test_distributed_environment,
multi_process_tensor_parallel)

random.seed(42)
Expand All @@ -27,8 +28,8 @@ def graph_allreduce(tp_size, pp_size, rank, distributed_init_port):
torch.cuda.set_device(device)
init_test_distributed_environment(tp_size, pp_size, rank,
distributed_init_port)

group = get_tensor_model_parallel_group()
ensure_model_parallel_initialized(tp_size, pp_size)
group = get_tensor_model_parallel_group().device_group

# A small all_reduce for warmup.
# this is needed because device communicators might be created lazily
Expand Down

0 comments on commit d571ca0

Please sign in to comment.