-
-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot find libdevice in TF 2.11 + compilation fails without ptxas #296
Comments
can you point to the specific fix? |
This is the clearest set of instructions I found tensorflow/tensorflow#56927 (comment) |
Specifically, if I do these steps, the error goes away
There's a new error about |
By new error, do you mean |
Yeah, with tensorflow 2.10 i get:
I wonder if we just have to disable xla. |
For internal reference, this is the pull request that moved that code last. That said, I just don't get what the problem is. Maybe we have to disable XLA? |
Yes, here's the error printout I get after applying the first "fix":
|
I uploaded some packages built with the the following patch diff --git a/recipe/build.sh b/recipe/build.sh
index 95db01e..a71c8c6 100644
--- a/recipe/build.sh
+++ b/recipe/build.sh
@@ -105,7 +105,7 @@ if [[ "${target_platform}" == "osx-arm64" ]]; then
# See https://conda-forge.org/docs/maintainer/knowledge_base.html#newer-c-features-with-old-sdk
export CXXFLAGS="${CXXFLAGS} -D_LIBCPP_DISABLE_AVAILABILITY"
fi
-export TF_ENABLE_XLA=1
+export TF_ENABLE_XLA=0
export BUILD_TARGET="//tensorflow/tools/pip_package:build_pip_package //tensorflow/tools/lib_package:libtensorflow //tensorflow:libtensorflow_cc${SHLIB_EXT}"
# Python settings
diff --git a/recipe/meta.yaml b/recipe/meta.yaml
index 7fb9b6b..b31eb19 100644
--- a/recipe/meta.yaml
+++ b/recipe/meta.yaml
@@ -16,7 +16,7 @@ source:
folder: tensorflow-estimator
build:
- number: 0
+ number: 1
skip: true # [win]
skip: true # [python_impl == 'pypy']
skip: true # [libabseil != '20220623.0']
|
Not sure if this helps, but I found that this is specifically triggered by the new optimizers that they made the default in TF 2.11. If you use
you get the error, but if you use
no error. |
I opened an issue with the Keras team here keras-team/tf-keras#62, in case that yields any results. |
Do you get the same results if you install from their conda packages and not ours? Typically people don't like to debug conda-forge stuff. |
Following TensorFlow's recommended installation steps, i.e.
produces the same error. The |
great thank you for confirming. |
This is a long-standing problem with XLA needing ptxas. If you get ptxas from somewhere else, e.g., |
I've been tracking this for a while. I think we don't get reports of this "bug" because people who use CUDA, usually have more than one installation and so somehow our tensorflow picks up all it needs from elsewhere if not available in conda-forge. In my experience, this is only ptxas, but it could be other things. An example is people who are on HPCs usually have native installations of cuda and ptxas is often part of that (not always, but one could always request it from admins). The good news: a whole new way of dealing with cuda is coming to conda-forge (great!) |
This doesn't make the initial |
Yeah, we will need fix the libdevice issue separately |
I can confirm that installing |
I have this libdevice issue too. Fix is appreciated. |
At least we see
with tensorflow-gpu 2.10 as well from conda-forge. Workaround was to create |
hmm, i just hit this again. I was unable to "fix" it so I had to downgrade to tensorflow 2.13 for the moment, will revisit "soon" |
Thanks Mark for drawing my attention to this! 🙏 Think there is a structuring issue with NVVM in the Idk if just restructuring the NVVM contents is enough to fix the issue, but it is at least a required step The CUDA 12 packages are better structured (and more complete). So it is possible using CUDA 12 will also fix the issue |
I found a workaround for TF 2.14: pip install nvidia-cuda-nvcc-cu11 This PyPI package contains |
@jakirkham Do you know what package includes NVVM files? It may need to be added to #353 |
Though I think TensorFlow hasn't been rebuilt for CUDA 12 yet ( #354 ) |
This work around works! |
I feel like i'm hitting this again abut I have tried to install These are the cuda packages I have:
The same recreator
still create teh effect. I have
|
nevermindSo i think the problem is that the
reveals that the build_env is in the Replacing that manually with the We should be able to make this substitution readily in the recipe, but would require a recompilation of the packages. Nevermind it succeeded because I was in the directory
prior to running
|
Still the "solution" seem to be to:
In fact I think it is related to CUDNN 8 vs 9.....
|
I can't help to think that it is the same issue as: |
The plot thickens:
mamba list of cudnn9_202
mamba list of cudnn9_203
Nothing really jumps out at me though from #403 |
It seems that Uwe has a plan (see discussion in #405), but just wanted to report that
seems to "resolve" things for those that need an immediate "fix" |
I don't have a plan on how to fix this issue though :( |
You don't think it is similar to conda-forge/jaxlib-feedstock#281 (comment) |
I think actually all you need is:
so it might just be the same issue where the build_prefix isn't getting replaced. |
I see that @drasmuss opened an issue with as well: @drasmuss thank you for the clear reproducer. In the future, i would specify that you are using conda-forge and not Anaconda. We experiment differently than upstream does expecially with splayed layouts which can cause this problem. Ok spelunking through the tensorflow code base, I find: Gonna keep looking to see if there is an official flag we can trigger. Attempting this fix locally:
fingers crossedwith the sed:
on main:
hmm that causes things to crash... I can "fix" the crash by specifying |
Alright, new patching candidate: On master (today): https://github.com/tensorflow/tensorflow/blob/d0ec13c1322e2c0d2584654634cc833541339376/third_party/xla/third_party/tsl/tsl/platform/default/cuda_root_path.cc#L59 The variable
|
Solution to issue cannot be found in the documentation.
Issue
TensorFlow 2.11 broke something about how they locate the
libdevice
library, when cuda is installed throughconda
. See tensorflow/tensorflow#56927 or tensorflow/tensorflow#59013.Here is a simple repro script:
Which gives the error:
I suspect that this is a bug on TensorFlow's end, not something you are really responsible for. But the only fixes in the issues linked above involve hacky workarounds, manually copying the
libdevice
file to some other location where TensorFlow is expecting to find it. So I'm wondering if it'd be possible to fix it more robustly in the conda-forge package, so that we don't have to manually copy files around every time we create a new environment.Installed packages
# Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge absl-py 1.4.0 pyhd8ed1ab_0 conda-forge aiohttp 3.8.3 py39hb9d737c_1 conda-forge aiosignal 1.3.1 pyhd8ed1ab_0 conda-forge appdirs 1.4.4 pyh9f0ad1d_0 conda-forge astunparse 1.6.3 pyhd8ed1ab_0 conda-forge async-timeout 4.0.2 pyhd8ed1ab_0 conda-forge attrs 22.2.0 pyh71513ae_0 conda-forge blinker 1.5 pyhd8ed1ab_0 conda-forge brotlipy 0.7.0 py39hb9d737c_1005 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.18.1 h7f98852_0 conda-forge ca-certificates 2022.12.7 ha878542_0 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge cachetools 5.2.0 pyhd8ed1ab_0 conda-forge certifi 2022.12.7 pyhd8ed1ab_0 conda-forge cffi 1.15.1 py39he91dace_3 conda-forge charset-normalizer 2.1.1 pyhd8ed1ab_0 conda-forge click 8.1.3 unix_pyhd8ed1ab_2 conda-forge cryptography 39.0.0 py39h079d5ae_0 conda-forge cudatoolkit 11.8.0 h37601d7_11 conda-forge cudnn 8.4.1.50 hed8a83a_0 conda-forge flatbuffers 22.12.06 hcb278e6_2 conda-forge frozenlist 1.3.3 py39hb9d737c_0 conda-forge gast 0.4.0 pyh9f0ad1d_0 conda-forge giflib 5.2.1 h36c2ea0_2 conda-forge google-auth 2.15.0 pyh1a96a4e_0 conda-forge google-auth-oauthlib 0.4.6 pyhd8ed1ab_0 conda-forge google-pasta 0.2.0 pyh8c360ce_0 conda-forge grpcio 1.51.1 py39h8c60046_0 conda-forge h5py 3.7.0 nompi_py39h817c9c5_102 conda-forge hdf5 1.12.2 nompi_h4df4325_101 conda-forge icu 70.1 h27087fc_0 conda-forge idna 3.4 pyhd8ed1ab_0 conda-forge importlib-metadata 6.0.0 pyha770c72_0 conda-forge jpeg 9e h166bdaf_2 conda-forge keras 2.11.0 pyhd8ed1ab_0 conda-forge keras-preprocessing 1.1.2 pyhd8ed1ab_0 conda-forge keyutils 1.6.1 h166bdaf_0 conda-forge krb5 1.20.1 h81ceb04_0 conda-forge ld_impl_linux-64 2.39 hcc3a1bd_1 conda-forge libabseil 20220623.0 cxx17_h05df665_6 conda-forge libaec 1.0.6 h9c3ff4c_0 conda-forge libblas 3.9.0 16_linux64_openblas conda-forge libcblas 3.9.0 16_linux64_openblas conda-forge libcurl 7.87.0 hdc1c0ab_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 12.2.0 h65d4601_19 conda-forge libgfortran-ng 12.2.0 h69a702a_19 conda-forge libgfortran5 12.2.0 h337968e_19 conda-forge libgomp 12.2.0 h65d4601_19 conda-forge libgrpc 1.51.1 h30feacc_0 conda-forge liblapack 3.9.0 16_linux64_openblas conda-forge libnghttp2 1.51.0 hff17c54_0 conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge libpng 1.6.39 h753d276_0 conda-forge libprotobuf 3.21.12 h3eb15da_0 conda-forge libsqlite 3.40.0 h753d276_0 conda-forge libssh2 1.10.0 hf14f497_3 conda-forge libstdcxx-ng 12.2.0 h46fd767_19 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libzlib 1.2.13 h166bdaf_4 conda-forge markdown 3.4.1 pyhd8ed1ab_0 conda-forge markupsafe 2.1.1 py39hb9d737c_2 conda-forge multidict 6.0.4 py39h72bdee0_0 conda-forge nccl 2.14.3.1 h0800d71_0 conda-forge ncurses 6.3 h27087fc_1 conda-forge numpy 1.24.1 py39h223a676_0 conda-forge oauthlib 3.2.2 pyhd8ed1ab_0 conda-forge openssl 3.0.7 h0b41bf4_1 conda-forge opt_einsum 3.3.0 pyhd8ed1ab_1 conda-forge packaging 23.0 pyhd8ed1ab_0 conda-forge pip 22.3.1 pyhd8ed1ab_0 conda-forge pooch 1.6.0 pyhd8ed1ab_0 conda-forge protobuf 4.21.12 py39h227be39_0 conda-forge pyasn1 0.4.8 py_0 conda-forge pyasn1-modules 0.2.7 py_0 conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pyjwt 2.6.0 pyhd8ed1ab_0 conda-forge pyopenssl 23.0.0 pyhd8ed1ab_0 conda-forge pysocks 1.7.1 pyha2e5f31_6 conda-forge python 3.9.15 hba424b6_0_cpython conda-forge python-flatbuffers 23.1.4 pyhd8ed1ab_0 conda-forge python_abi 3.9 3_cp39 conda-forge pyu2f 0.1.5 pyhd8ed1ab_0 conda-forge re2 2022.06.01 h27087fc_1 conda-forge readline 8.1.2 h0f457ee_0 conda-forge requests 2.28.1 pyhd8ed1ab_1 conda-forge requests-oauthlib 1.3.1 pyhd8ed1ab_0 conda-forge rsa 4.9 pyhd8ed1ab_0 conda-forge scipy 1.10.0 py39h7360e5f_0 conda-forge setuptools 65.6.3 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge snappy 1.1.9 hbd366e4_2 conda-forge tensorboard 2.11.0 pyhd8ed1ab_0 conda-forge tensorboard-data-server 0.6.1 py39h3ccb8fc_4 conda-forge tensorboard-plugin-wit 1.8.1 pyhd8ed1ab_0 conda-forge tensorflow 2.11.0 cuda112py39h01bd6f0_0 conda-forge tensorflow-base 2.11.0 cuda112py39haa5674d_0 conda-forge tensorflow-estimator 2.11.0 cuda112py39h11d7a3b_0 conda-forge termcolor 2.2.0 pyhd8ed1ab_0 conda-forge tk 8.6.12 h27826a3_0 conda-forge typing-extensions 4.4.0 hd8ed1ab_0 conda-forge typing_extensions 4.4.0 pyha770c72_0 conda-forge tzdata 2022g h191b570_0 conda-forge urllib3 1.26.14 pyhd8ed1ab_0 conda-forge werkzeug 2.2.2 pyhd8ed1ab_0 conda-forge wheel 0.38.4 pyhd8ed1ab_0 conda-forge wrapt 1.14.1 py39hb9d737c_1 conda-forge xz 5.2.6 h166bdaf_0 conda-forge yarl 1.8.2 py39hb9d737c_0 conda-forge zipp 3.11.0 pyhd8ed1ab_0 conda-forge zlib 1.2.13 h166bdaf_4 conda-forge
Environment info
The text was updated successfully, but these errors were encountered: