Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openfold-venv environment installs pytorch cpu version #45

Open
paulasanematsu opened this issue May 16, 2024 · 0 comments
Open

openfold-venv environment installs pytorch cpu version #45

paulasanematsu opened this issue May 16, 2024 · 0 comments

Comments

@paulasanematsu
Copy link

Hi,

I am getting started on OpenFold and was trying to run the Single GPU training example first before jumping to the multi-node example.

I followed Option 2 Install instructions:

$ bash scripts/build_local_openfold_venv.sh /scratch/paulasan/conda/envs/openfold-venv
$ source scripts/activate_local_openfold_venv.sh /scratch/paulasan/conda/envs/openfold-venv
openfold-venv activated!

It installed PyTorch, but the cpu version and, therefore, the GPU could not be found:

(openfold-venv) [paulasan@holygpu8a22605 openfold]$ python -c 'import torch;print(torch.__version__)'
2.0.1
(openfold-venv) [paulasan@holygpu8a22605 openfold]$ python -c 'import torch;print(torch.cuda.is_available())'
False

Based on PyTorch's installation instruction for old versions, I installed pytorch-cuda=11.7:

(openfold-venv) [paulasan@holygpu8a22605 openfold]$ conda install pytorch-cuda=11.7 -c pytorch -c nvidia
Channels:
 - pytorch
 - nvidia
 - defaults
 - conda-forge
 - bioconda
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /scratch/paulasan/conda/envs/openfold-venv

  added / updated specs:
    - pytorch-cuda=11.7


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    cuda-cudart-11.7.99        |                0         194 KB  nvidia
    cuda-cupti-11.7.101        |                0        22.9 MB  nvidia
    cuda-libraries-11.7.1      |                0           1 KB  nvidia
    cuda-nvrtc-11.7.99         |                0        17.3 MB  nvidia
    cuda-nvtx-11.7.91          |                0          57 KB  nvidia
    cuda-runtime-11.7.1        |                0           1 KB  nvidia
    libcublas-11.10.3.66       |                0       286.1 MB  nvidia
    libcufft-10.7.2.124        |       h4fbf590_0        93.6 MB  nvidia
    libcufile-1.9.1.3          |                0         1.0 MB  nvidia
    libcurand-10.3.5.147       |                0        51.8 MB  nvidia
    libcusolver-11.4.0.1       |                0        78.7 MB  nvidia
    libcusparse-11.7.4.91      |                0       151.1 MB  nvidia
    libnpp-11.7.4.75           |                0       129.3 MB  nvidia
    libnvjpeg-11.8.0.2         |                0         2.2 MB  nvidia
    pytorch-2.0.1              |py3.8_cuda11.7_cudnn8.5.0_0        1.20 GB  pytorch
    pytorch-cuda-11.7          |       h778d358_5           3 KB  pytorch
    pytorch-mutex-1.0          |             cuda           3 KB  pytorch
    torchtriton-2.0.0          |             py38        62.6 MB  pytorch
    ------------------------------------------------------------
                                           Total:        2.07 GB

The following NEW packages will be INSTALLED:

  cuda-cudart        nvidia/linux-64::cuda-cudart-11.7.99-0
  cuda-cupti         nvidia/linux-64::cuda-cupti-11.7.101-0
  cuda-libraries     nvidia/linux-64::cuda-libraries-11.7.1-0
  cuda-nvrtc         nvidia/linux-64::cuda-nvrtc-11.7.99-0
  cuda-nvtx          nvidia/linux-64::cuda-nvtx-11.7.91-0
  cuda-runtime       nvidia/linux-64::cuda-runtime-11.7.1-0
  libcublas          nvidia/linux-64::libcublas-11.10.3.66-0
  libcufft           nvidia/linux-64::libcufft-10.7.2.124-h4fbf590_0
  libcufile          nvidia/linux-64::libcufile-1.9.1.3-0
  libcurand          nvidia/linux-64::libcurand-10.3.5.147-0
  libcusolver        nvidia/linux-64::libcusolver-11.4.0.1-0
  libcusparse        nvidia/linux-64::libcusparse-11.7.4.91-0
  libnpp             nvidia/linux-64::libnpp-11.7.4.75-0
  libnvjpeg          nvidia/linux-64::libnvjpeg-11.8.0.2-0
  pytorch-cuda       pytorch/linux-64::pytorch-cuda-11.7-h778d358_5
  torchtriton        pytorch/linux-64::torchtriton-2.0.0-py38

The following packages will be UPDATED:

  pytorch-mutex                                     1.0-cpu --> 1.0-cuda

The following packages will be DOWNGRADED:

  pytorch                                 2.0.1-py3.8_cpu_0 --> 2.0.1-py3.8_cuda11.7_cudnn8.5.0_0

Note that it had the cpu version and now it is updating with the cuda version.

After the update, then the GPU was properly found:

(openfold-venv) [paulasan@holygpu8a22605 openfold]$ python -c 'import torch;print(torch.__version__)'
2.0.1
(openfold-venv) [paulasan@holygpu8a22605 openfold]$ python -c 'import torch;print(torch.cuda.is_available())'
True
(openfold-venv) [paulasan@holygpu8a22605 openfold]$ python -c 'import torch;print(torch.cuda.device_count())'
1
(openfold-venv) [paulasan@holygpu8a22605 openfold]$ python -c 'import torch;print(torch.cuda.current_device())'
0
(openfold-venv) [paulasan@holygpu8a22605 openfold]$ python -c 'import torch;print(torch.cuda.device(0))'
<torch.cuda.device object at 0x14d4a48b16a0>
(openfold-venv) [paulasan@holygpu8a22605 openfold]$ python -c 'import torch;print(torch.cuda.get_device_name(0))'
NVIDIA A100-SXM4-80GB

Finally, I haven't started the multi-node run, but I am wondering if the cpu version is to run the multi-node training? I am asking because I don't see any gpu resources in this batch script.

Thank you,
Paula

--
Paula C. Sanematsu
Sr. Research Computing Facilitator
FAS Research Computing
Harvard University

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant