Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable CUDA testing #741

Merged
merged 2 commits into from
Nov 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions .github/azure-gpu-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
trigger:
branches:
include:
- "main"

pr:
branches:
include:
- "main"

jobs:
- job: testing
timeoutInMinutes: "20"
cancelTimeoutInMinutes: "2"
pool: "lit-rtx-3090"
variables:
DEVICES: $( python -c 'print("$(Agent.Name)".split("_")[-1])' )
container:
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.10-torch2.1-cuda12.1.0"
options: "--gpus=all --shm-size=8gb"
workspace:
clean: all
steps:

- bash: |
echo "##vso[task.setvariable variable=CUDA_VISIBLE_DEVICES]$(DEVICES)"
cuda_ver=$(python -c "import torch ; print(''.join(map(str, torch.version.cuda.split('.')[:2])))")
echo "##vso[task.setvariable variable=CUDA_VERSION_MM]$cuda_ver"
echo "##vso[task.setvariable variable=TORCH_URL]https://download.pytorch.org/whl/cu${cuda_ver}/torch_stable.html"
displayName: 'set env. vars'

- bash: |
echo $(DEVICES)
echo $CUDA_VISIBLE_DEVICES
echo $CUDA_VERSION_MM
echo $TORCH_URL
whereis nvidia
nvidia-smi
which python && which pip
python --version
pip --version
pip list
displayName: "Image info & NVIDIA"

- script: |
pip install -r requirements-all.txt pytest pytest-rerunfailures transformers einops protobuf
displayName: 'Install dependencies'

- bash: |
set -e
pip list
python -c "import torch ; mgpu = torch.cuda.device_count() ; assert mgpu == 2, f'GPU: {mgpu}'"
displayName: "Env details"

- bash: pytest -v --disable-pytest-warnings --strict-markers --color=yes
displayName: 'Testing'
2 changes: 2 additions & 0 deletions tests/test_lora.py
Original file line number Diff line number Diff line change
Expand Up @@ -354,6 +354,8 @@ def test_lora_qkv_linear_weights_merged_status(rank, enable_lora, expected_merge
@pytest.mark.skipif(not torch.cuda.is_available(), reason="8bit requires CUDA")
# platform dependent cuda issue: libbitsandbytes_cpu.so: undefined symbol: cquantize_blockwise_fp16_nf4
@pytest.mark.xfail(raises=AttributeError, strict=False)
# https://github.com/Lightning-AI/lit-gpt/issues/513
@pytest.mark.xfail(raises=RuntimeError, strict=True)
def test_lora_merge_with_quantize():
from lightning.fabric.plugins.precision.bitsandbytes import _BITSANDBYTES_AVAILABLE, BitsandbytesPrecision

Expand Down