I m having hard time making torch work nicely with cuda with Pants #general

I'm having hard time making torch work nicely with...

boundless-zebra-79556

05/02/2023, 3:44 AM

I'm having hard time making torch work nicely with cuda within pex. To reproduce, consider:

Copy code

$ docker run --rm -it --gpus all --entrypoint=bash nvidia/cuda:11.7.1-devel-ubuntu22.04

# Inside docker env:
$ apt update && apt install -y python3.10 pip
$ pip install pex
$ pex torch==2.0.0
>>> import torch
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/root/.pex/installed_wheels/7a9319a67294ef02459a19738bbfa8727bb5307b822dadd708bc2ccf6c901aca/torch-2.0.0-cp310-cp310-manylinux1_x86_64.whl/torch/__init__.py", line 229, in <module>
    from torch._C import *  # noqa: F403
ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory

$ pip install torch==2.0.0
$ python3.10
>>> import torch
>>> torch.cuda.is_available()
# etc etc Works fine

Any idea why this might be the case? I need to use this image so that it will work well with EKS' rather outdated GPU driver

✅ 1

boundless-zebra-79556

05/02/2023, 3:45 AM

I see that in this image it has

LD_LIBRARY_PATH

set to

/usr/local/nvidia/lib:/usr/local/nvidia/lib64

. any chance its not propagated to pex environment?

enough-analyst-54434

05/02/2023, 4:20 AM

No, PEX does not scrub env vars like Pants. Please try

pex torch==2.0 --venv  --venv-site-packages-copies --pip-version latest --resolver-version pip-2020-resolver

enough-analyst-54434

05/02/2023, 4:29 AM

That's a mouthful, but it's apples to apples with the pip install you did.

boundless-zebra-79556

05/02/2023, 4:30 AM

Thanks!

pex torch==2.0.0 --venv  --venv-site-packages-copies

seems to do the trick. Could you quickly explain how

--venv-site-packages-copies

might have helped in this scenario please?

boundless-zebra-79556

05/02/2023, 4:36 AM

I see from the official doc that "This can be used to work around problems with tools or libraries that are confused by symlinked source files.". so I sort of get what's going on. Thanks!

4 Views

Open in Slack

Previous Next