openfold-venv environment installs pytorch cpu version #45

paulasanematsu · 2024-05-16T17:20:03Z

Hi,

I am getting started on OpenFold and was trying to run the Single GPU training example first before jumping to the multi-node example.

I followed Option 2 Install instructions:

$ bash scripts/build_local_openfold_venv.sh /scratch/paulasan/conda/envs/openfold-venv
$ source scripts/activate_local_openfold_venv.sh /scratch/paulasan/conda/envs/openfold-venv
openfold-venv activated!

It installed PyTorch, but the cpu version and, therefore, the GPU could not be found:

(openfold-venv) [paulasan@holygpu8a22605 openfold]$ python -c 'import torch;print(torch.__version__)'
2.0.1
(openfold-venv) [paulasan@holygpu8a22605 openfold]$ python -c 'import torch;print(torch.cuda.is_available())'
False

Based on PyTorch's installation instruction for old versions, I installed pytorch-cuda=11.7:

(openfold-venv) [paulasan@holygpu8a22605 openfold]$ conda install pytorch-cuda=11.7 -c pytorch -c nvidia
Channels:
 - pytorch
 - nvidia
 - defaults
 - conda-forge
 - bioconda
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /scratch/paulasan/conda/envs/openfold-venv

  added / updated specs:
    - pytorch-cuda=11.7


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    cuda-cudart-11.7.99        |                0         194 KB  nvidia
    cuda-cupti-11.7.101        |                0        22.9 MB  nvidia
    cuda-libraries-11.7.1      |                0           1 KB  nvidia
    cuda-nvrtc-11.7.99         |                0        17.3 MB  nvidia
    cuda-nvtx-11.7.91          |                0          57 KB  nvidia
    cuda-runtime-11.7.1        |                0           1 KB  nvidia
    libcublas-11.10.3.66       |                0       286.1 MB  nvidia
    libcufft-10.7.2.124        |       h4fbf590_0        93.6 MB  nvidia
    libcufile-1.9.1.3          |                0         1.0 MB  nvidia
    libcurand-10.3.5.147       |                0        51.8 MB  nvidia
    libcusolver-11.4.0.1       |                0        78.7 MB  nvidia
    libcusparse-11.7.4.91      |                0       151.1 MB  nvidia
    libnpp-11.7.4.75           |                0       129.3 MB  nvidia
    libnvjpeg-11.8.0.2         |                0         2.2 MB  nvidia
    pytorch-2.0.1              |py3.8_cuda11.7_cudnn8.5.0_0        1.20 GB  pytorch
    pytorch-cuda-11.7          |       h778d358_5           3 KB  pytorch
    pytorch-mutex-1.0          |             cuda           3 KB  pytorch
    torchtriton-2.0.0          |             py38        62.6 MB  pytorch
    ------------------------------------------------------------
                                           Total:        2.07 GB

The following NEW packages will be INSTALLED:

  cuda-cudart        nvidia/linux-64::cuda-cudart-11.7.99-0
  cuda-cupti         nvidia/linux-64::cuda-cupti-11.7.101-0
  cuda-libraries     nvidia/linux-64::cuda-libraries-11.7.1-0
  cuda-nvrtc         nvidia/linux-64::cuda-nvrtc-11.7.99-0
  cuda-nvtx          nvidia/linux-64::cuda-nvtx-11.7.91-0
  cuda-runtime       nvidia/linux-64::cuda-runtime-11.7.1-0
  libcublas          nvidia/linux-64::libcublas-11.10.3.66-0
  libcufft           nvidia/linux-64::libcufft-10.7.2.124-h4fbf590_0
  libcufile          nvidia/linux-64::libcufile-1.9.1.3-0
  libcurand          nvidia/linux-64::libcurand-10.3.5.147-0
  libcusolver        nvidia/linux-64::libcusolver-11.4.0.1-0
  libcusparse        nvidia/linux-64::libcusparse-11.7.4.91-0
  libnpp             nvidia/linux-64::libnpp-11.7.4.75-0
  libnvjpeg          nvidia/linux-64::libnvjpeg-11.8.0.2-0
  pytorch-cuda       pytorch/linux-64::pytorch-cuda-11.7-h778d358_5
  torchtriton        pytorch/linux-64::torchtriton-2.0.0-py38

The following packages will be UPDATED:

  pytorch-mutex                                     1.0-cpu --> 1.0-cuda

The following packages will be DOWNGRADED:

  pytorch                                 2.0.1-py3.8_cpu_0 --> 2.0.1-py3.8_cuda11.7_cudnn8.5.0_0

Note that it had the cpu version and now it is updating with the cuda version.

After the update, then the GPU was properly found:

(openfold-venv) [paulasan@holygpu8a22605 openfold]$ python -c 'import torch;print(torch.__version__)'
2.0.1
(openfold-venv) [paulasan@holygpu8a22605 openfold]$ python -c 'import torch;print(torch.cuda.is_available())'
True
(openfold-venv) [paulasan@holygpu8a22605 openfold]$ python -c 'import torch;print(torch.cuda.device_count())'
1
(openfold-venv) [paulasan@holygpu8a22605 openfold]$ python -c 'import torch;print(torch.cuda.current_device())'
0
(openfold-venv) [paulasan@holygpu8a22605 openfold]$ python -c 'import torch;print(torch.cuda.device(0))'
<torch.cuda.device object at 0x14d4a48b16a0>
(openfold-venv) [paulasan@holygpu8a22605 openfold]$ python -c 'import torch;print(torch.cuda.get_device_name(0))'
NVIDIA A100-SXM4-80GB

Finally, I haven't started the multi-node run, but I am wondering if the cpu version is to run the multi-node training? I am asking because I don't see any gpu resources in this batch script.

Thank you,
Paula

--
Paula C. Sanematsu
Sr. Research Computing Facilitator
FAS Research Computing
Harvard University

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openfold-venv environment installs pytorch cpu version #45

openfold-venv environment installs pytorch cpu version #45

paulasanematsu commented May 16, 2024

openfold-venv environment installs pytorch cpu version #45

openfold-venv environment installs pytorch cpu version #45

Comments

paulasanematsu commented May 16, 2024