Avoid poisoning process with CUDA calls as soon as importing #6810

HollowMan6 · 2024-11-29T21:49:59Z

Switch from torch.cuda.is_available() to torch.cuda.device_count() > 0, to give priority to nvml based availability, so that we can try not to poison process with CUDA calls as soon as we execute import deepspeed.

https://github.com/pytorch/pytorch/blob/v2.5.1/torch/cuda/__init__.py#L120-L124

There are 2 reasons to make this change:

Firstly, if we accidentally import deepspeed, since the CUDA runtime initializes when the first CUDA API call is made and caches the device list, changing the CUDA_VISIBLE_DEVICES within the same process after initialization won't have any effect on the visible devices. The specific case:
OpenRLHF/OpenRLHF#524 (comment)

A demo for reproduction before the fix is applied:

import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""
import deepspeed
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3"
torch.cuda.set_device('cuda:0')

Secondly, https://pytorch.org/docs/stable/notes/cuda.html

When assessing the availability of CUDA in a given environment (is_available()), PyTorch’s default behavior is to call the CUDA Runtime API method cudaGetDeviceCount. Because this call in turn initializes the CUDA Driver API (via cuInit) if it is not already initialized, subsequent forks of a process that has run is_available() will fail with a CUDA initialization error.

Switch from `torch.cuda.is_available()` to `torch.cuda.device_count() > 0`, to give priority to nvml based availability, so that we can try not to poison process with CUDA calls as soon as we execute `import deepspeed`. https://github.com/pytorch/pytorch/blob/v2.5.1/torch/cuda/__init__.py#L120-L124 There are 2 reasons to make this change: Firstly, if we accidentally import deepspeed, since the CUDA runtime initializes when the first CUDA API call is made and caches the device list, changing the CUDA_VISIBLE_DEVICES within the same process after initialization won't have any effect on the visible devices. The specific case: OpenRLHF/OpenRLHF#524 (comment) A demo for reproduction before the fix is applied: ```python import torch import os os.environ["CUDA_VISIBLE_DEVICES"] = "" import deepspeed os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3" torch.cuda.set_device('cuda:0') ``` Secondly, https://pytorch.org/docs/stable/notes/cuda.html When assessing the availability of CUDA in a given environment (is_available()), PyTorch’s default behavior is to call the CUDA Runtime API method cudaGetDeviceCount. Because this call in turn initializes the CUDA Driver API (via cuInit) if it is not already initialized, subsequent forks of a process that has run is_available() will fail with a CUDA initialization error. Signed-off-by: Hollow Man <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid poisoning process with CUDA calls as soon as importing #6810

Avoid poisoning process with CUDA calls as soon as importing #6810

HollowMan6 commented Nov 29, 2024

Avoid poisoning process with CUDA calls as soon as importing #6810

Are you sure you want to change the base?

Avoid poisoning process with CUDA calls as soon as importing #6810

Conversation

HollowMan6 commented Nov 29, 2024