I'm having hard time making torch work nicely with...
# general
b
I'm having hard time making torch work nicely with cuda within pex. To reproduce, consider:
Copy code
$ docker run --rm -it --gpus all --entrypoint=bash nvidia/cuda:11.7.1-devel-ubuntu22.04

# Inside docker env:
$ apt update && apt install -y python3.10 pip
$ pip install pex
$ pex torch==2.0.0
>>> import torch
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/root/.pex/installed_wheels/7a9319a67294ef02459a19738bbfa8727bb5307b822dadd708bc2ccf6c901aca/torch-2.0.0-cp310-cp310-manylinux1_x86_64.whl/torch/__init__.py", line 229, in <module>
    from torch._C import *  # noqa: F403
ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory

$ pip install torch==2.0.0
$ python3.10
>>> import torch
>>> torch.cuda.is_available()
# etc etc Works fine
Any idea why this might be the case? I need to use this image so that it will work well with EKS' rather outdated GPU driver
1
I see that in this image it has
LD_LIBRARY_PATH
set to
/usr/local/nvidia/lib:/usr/local/nvidia/lib64
. any chance its not propagated to pex environment?
e
No, PEX does not scrub env vars like Pants. Please try
pex torch==2.0 --venv  --venv-site-packages-copies --pip-version latest --resolver-version pip-2020-resolver
That's a mouthful, but it's apples to apples with the pip install you did.
b
Thanks!
pex torch==2.0.0 --venv  --venv-site-packages-copies
seems to do the trick. Could you quickly explain how
--venv-site-packages-copies
might have helped in this scenario please?
I see from the official doc that "This can be used to work around problems with tools or libraries that are confused by symlinked source files.". so I sort of get what's going on. Thanks!