Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip repeatedly selecting wrong CUDA version during install #308

Open
fratajcz opened this issue Feb 9, 2023 · 5 comments
Open

pip repeatedly selecting wrong CUDA version during install #308

fratajcz opened this issue Feb 9, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@fratajcz
Copy link

fratajcz commented Feb 9, 2023

Hi!

I am trying to get an installation running on an HPC cluster with somewhat older dependencies. I have found torch 1.8.0 with CUDA 11.1 to work, so now I want to install the matching torch_sparse version:

$pip install torch-sparse -f https://data.pyg.org/whl/torch-1.8.0+cu111.html --no-cache-dir

Looking in links: https://data.pyg.org/whl/torch-1.8.0+cu111.html
Collecting torch-sparse
  Downloading torch_sparse-0.6.16.tar.gz (208 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 208.2/208.2 kB 11.8 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: scipy in /home/icb/florin.ratajczak/anaconda3/envs/speostest/lib/python3.7/site-packages (from torch-sparse) (1.7.3)
Requirement already satisfied: numpy<1.23.0,>=1.16.5 in /home/icb/florin.ratajczak/anaconda3/envs/speostest/lib/python3.7/site-packages (from scipy->torch-sparse) (1.21.5)
Building wheels for collected packages: torch-sparse
  Building wheel for torch-sparse (setup.py) ... done
  Created wheel for torch-sparse: filename=torch_sparse-0.6.16-cp37-cp37m-linux_x86_64.whl size=1716194 sha256=46187b7f1bf7bae117f0e216f72bd6cf67ae95c2f9b6734d5369aa47099de042
  Stored in directory: /tmp/pip-ephem-wheel-cache-23v87jns/wheels/ff/5c/28/5d12cf8ac7bb8bc3de9dda8fa446cb4aeb9fffe19ef1028538
Successfully built torch-sparse
Installing collected packages: torch-sparse
Successfully installed torch-sparse-0.6.16

I just added the --no-cache-dir flag to make sure it doesnt load some other version from cache.

However, when I now try to import it, it says its compiled for CUDA 12.0:

(speostest) [florin.ratajczak@hpc-submit03gui speos]$ python
Python 3.7.16 (default, Jan 17 2023, 22:20:44) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.8.0'
>>> torch.version.cuda
'11.1'
>>> import torch_sparse
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/icb/florin.ratajczak/anaconda3/envs/speostest/lib/python3.7/site-packages/torch_sparse/__init__.py", line 33, in <module>
    f'Detected that PyTorch and torch_sparse were compiled with '
RuntimeError: Detected that PyTorch and torch_sparse were compiled with different CUDA versions. PyTorch has CUDA version 11.1 and torch_sparse has CUDA version 12.0. Please reinstall the torch_sparse that matches your PyTorch install.

Any idea why it says it was compiled for CUDA 12.0? I might have to host a course on that cluster in a week or so, so any help is much appreciated!

@arne-vdb
Copy link

Hi, I have a similar problem.

I installed torch-sparse like this:
pip install torch-scatter torch-sparse==0.6.12 torch-cluster torch-spline-conv torch-geometric==2.0.4 -f https://data.pyg.org/whl/torch-1.8.2+cpu.html --no-cache-dir

But while running I get the following error:

Traceback (most recent call last):
File "/dss/dsshome1/lxc03/gobi005/BlockEQTL/speos/speos/scripts/explanation_scripts/explanation_one_model.py", line 2, in <module>
from speos.models import ModelBootstrapper
File "/dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/site-packages/speos/models.py", line 1, in <module>
from speos.architectures import GeneNetwork, RelationalGeneNetwork, FCNN, LINKX, SimpleGCN
File "/dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/site-packages/speos/architectures.py", line 4, in <module>
import torch_geometric.nn as pyg_nn
File "/dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/site-packages/torch_geometric/__init__.py", line 4, in <module>
import torch_geometric.data
File "/dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/site-packages/torch_geometric/data/__init__.py", line 1, in <module>
from .data import Data
File "/dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/site-packages/torch_geometric/data/data.py", line 9, in <module>
from torch_sparse import SparseTensor
File "/dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/site-packages/torch_sparse/__init__.py", line 16, in <module>
f'{library}_{suffix}', [osp.dirname(__file__)]).origin)
File "/dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/site-packages/torch/_ops.py", line 104, in load_library
ctypes.CDLL(path)
File "/dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/ctypes/__init__.py", line 364, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/site-packages/torch_sparse/_convert_cpu.so: undefined symbol: __kmpc_fork_call

I think that the reason for this might also be an error produced by different cuda versions of pytorch and torch-sparse.

@rusty1s
Copy link
Owner

rusty1s commented Mar 1, 2023

Do you have a PyTorch CUDA version installed? Otherwise, I don't think this is necessarily related to CUDA but more due to a mismatch in PyTorch versions.

@fratajcz
Copy link
Author

fratajcz commented Mar 1, 2023

Thanks for the response!

I have tested several cuda and cpu versions and the only one that worked was If I install torch for cuda 11.6 (which is CUDA 12.0 compatible) and then later install torch-sparse for 11.6 as well. All other cuda/cpu versions lead to the problem that torch-sparse is incompatible with torch, even if both have been installed with the right instructions. Also, from my output above you can see that the error message says that torch is compiled for CUDA 11.1 and torch-sparse for 12.0, even though i explicitely requested the cu111 version. Perhaps some paths got scrambled up and torch-sparse is installed for CUDA 12.0, no matter which -f argument is specified?

@github-actions
Copy link

This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?

@github-actions github-actions bot added the stale label Aug 29, 2023
@rusty1s rusty1s added bug Something isn't working and removed stale labels Aug 29, 2023
@Delaunay
Copy link

Have you tried with pip install --no-build-isolation ?
If not, it will install torch to build the package and install it afterward.
I had that issue with ROCm when it would ignore the rocm torch version and install the default torch + cuda version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants