I opened an issue on the GitHub repo (<https://git...
# general
b
I opened an issue on the GitHub repo (https://github.com/pantsbuild/pants/issues/19505). If there’s any additional information you need from me, please let me know. I’m eager to help in any way I can.
e
The issue is you build the PEX on and for Mac and then try to use it in a Linux container. This will never work for any PEX with native dependencies (you have several of these according to the error message). To build a PEX for a foreign platform you need to specify
pkatforms
or
complete_platforms
on your
pex_binary
target. I'll add more detail to the issue in a few hours about how to do this, but you might read here: https://www.pantsbuild.org/docs/reference-pex_binary#codecomplete_platformscode
b
Thank you so much for the clarification. I wasn’t aware of the platform-specific intricacies with PEX. I’ll try building the PEX with the
platforms
or
complete_platforms
as you suggested. I’ll also go through the documentation link you provided. Looking forward to the additional details on the issue.
c
there’s plenty more in this slack if you search using cross platform pex docker as keywords.. 😉
b
I’m about to test it now. However, I’d like to mention that I encountered the same error even when I built it in the same environment as the execution environment. I’ve been using pantsbuild cross-platform without any issues, but this problem arose for the first time when I added whisper. I suspect there might be an issue with the wheels of dependencies like pytorch that whisper relies on. Nevertheless, I’ll start by trying the build in the same environment as you suggested.
c
yes, as long as you have platform agnostic dependencies, there’s no issues 😉
e
I read too fast - you're trying to use a docker environment which means the PEX file is built inside docker which negates all I said. The new issue though is you're trying to target a Linux x86_64 image from Mac arm. Still digging, but that is territory I'm less familiar with.
c
ah, the docker container might be running as an arm platform…
e
Yeah @bland-father-19717 one missing step in your repro setup is providing a README that shows the commands you run that lead to failure. It's great you provided a repo, most do not do that, but that missing bit can be critical.
b
I apologize, I’ll add the command to reproduce the issue.
e
Ok, even when you get past the arm / x86_64 platform mismatch issue, you'll then hit this lovely bit of insanity (I knew this in the past but had forgotten):
Copy code
(example.venv) jsirois@Gill-Windows:~/support/pants/peachanG $ unzip -qc ~/downloads/torch-2.0.1-cp38-none-macosx_11_0_arm64.whl torch-2.0.1.dist-info/METADATA | grep Requires
Requires-Python: >=3.8.0
Requires-Dist: filelock
Requires-Dist: typing-extensions
Requires-Dist: sympy
Requires-Dist: networkx
Requires-Dist: jinja2
Requires-Dist: opt-einsum (>=3.3) ; extra == 'opt-einsum'
(example.venv) jsirois@Gill-Windows:~/support/pants/peachanG $ unzip -qc ~/downloads/torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl torch-2.0.1.dist-info/METADATA | grep Requires
Requires-Python: >=3.8.0
Requires-Dist: filelock
Requires-Dist: typing-extensions
Requires-Dist: sympy
Requires-Dist: networkx
Requires-Dist: jinja2
Requires-Dist: nvidia-cuda-nvrtc-cu11 (==11.7.99) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cuda-runtime-cu11 (==11.7.99) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cuda-cupti-cu11 (==11.7.101) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cudnn-cu11 (==8.5.0.96) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cublas-cu11 (==11.10.3.66) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cufft-cu11 (==10.9.0.58) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-curand-cu11 (==10.2.10.91) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cusolver-cu11 (==11.4.0.1) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cusparse-cu11 (==11.7.4.91) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-nccl-cu11 (==2.14.3) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-nvtx-cu11 (==11.7.91) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: triton (==2.0.0) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: opt-einsum (>=3.3) ; extra == 'opt-einsum'
That differing metadata per-wheel defeats Pex locking utterly. Pex assumes all artifacts for a given project version will contain the same requirement metadata; in other words, it does not download all the available artifacts for a given version (many terabytes worth in the torch case), to get the requirement metadata. It just downloads 1. It looks like your lock downloaded the mac wheel and thus leaves out all the nvidia requirements. As such, you can never build a proper PEX for linux using your lock file.
@bland-father-19717 you might search slack and Pants issues for torch or pytorch. Others have run into torch difficulties and they have - iirc - hacky solutions of one form or the other. Torch is just a very bad Python ecosystem citizen here. It presents an important - due to popularity - but hard to solve case.
If you were just using Pex alone, you'd not create a universal lock (which is what Pants does), but instead create a lock for Mac and another lock for Linux and this would all just work. So Pants gets in the way here on top of the torch madness.
b
I’m well aware of the challenges posed by pytorch, and it’s indeed a tricky situation. However, I can’t dismiss the requests from our data scientists who wish to use pytorch. I’ll try to see if it works correctly in a Linux-only environment.
e
@bland-father-19717 it asolutely will not work using a Pants-generated lockfile.
😱 1
It might work by luck only if you generate the lockfile on a Linux machine. In that case the primary artifact where the metadata is read from may be the linux wheel which has the nvidia requirements in its metadata.
So, the thing you really need to understand is the differing unzip listings I pasted above. That is the fundamental issue here (combined with Pants use of --style universal locks).
👀 1
One way would be to transform all the
Requires-Dist: nvidia-cuda-nvrtc-cu11 (==11.7.99) ; platform_system == "Linux" and platform_machine == "x86_64"
into manual dependencies in Pants. Something like:
Copy code
python_requirement(
    name="evil-torch-workaround",
    requirements=[
        "torch==2.0.1",
        'nvidia-cuda-nvrtc-cu11==11.7.99; platform_system == "Linux" and platform_machine == "x86_64"',
        ...
    ],
)
Of course, that means to bump torch, you need to go research what the full union of its requirements are using unzip like I did above.
b
Oh, thank you! I’ll give it a try.
@enough-analyst-54434 I tested it in an AWS EC2 environment and encountered the following error. Do you happen to know a solution?
Copy code
/usr/local/bin/python3.8: can't find '__main__' module in '/bin/whisper'
Details: https://github.com/pantsbuild/pants/issues/19505#issuecomment-1723514425
️:
pants run src/python/main/whisper/main.py
:
pants run src/python/main/whisper:main
:
pants run src/python/main/whisper/Dockerfile
:
pants run src/python/main/whisper:whisper_docker
e
Yeah, so - I know you know, but I'll re-iterate that torch is insane. The PEX zip is too big (~2.3 GB compressed and ~4.5GB uncompressed) for the python zipimporter (which is what handles launching zipapps). Although Python the language has a zipfile module that handles zip64, the zipimporter uses different C code that does not. As such, the zipimporter launcher fails to be able to read the zip properly and cannot find the
__main__.py
inside even though it is there (I had to install zip and unzip inside the image but I omitted these steps below):
Copy code
$ docker run --rm -it --entrypoint bash whisper_docker:latest
root@f8fa69b3a6f2:/# ls -lrth /bin/whisper
-r-xr-xr-x 1 root root 2.3G Sep 18 17:25 /bin/whisper
root@f8fa69b3a6f2:/# zipinfo /bin/whisper | tail -1
19515 files, 4576079657 bytes uncompressed, 2368282036 bytes compressed:  48.2%
As an experiment I removed 1 ~600MB file from the zip to bring it under 4GB uncompressed:
Copy code
root@f8fa69b3a6f2:/# cp /bin/whisper /bin/whisper.zip
root@f8fa69b3a6f2:/# zip -d /bin/whisper.zip .deps/torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl/torch/lib/libtorch_cuda.so
deleting: .deps/torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl/torch/lib/libtorch_cuda.so
        zip warning: Local Version Needed To Extract does not match CD: .deps/torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl/torch/lib/libtorch_cuda_linalg.so
...
root@f8fa69b3a6f2:/# ls -lrth /bin | grep whisper
-r-xr-xr-x 1 root root 2.3G Sep 18 17:25 whisper
-r-xr-xr-x 1 root root 1.9G Sep 18 17:41 whisper.zip
root@f8fa69b3a6f2:/# zipinfo /bin/whisper.zip | tail -1
19514 files, 3919218968 bytes uncompressed, 1970217874 bytes compressed:  49.7%
That then ~works:
Copy code
root@f8fa69b3a6f2:/# whisper.zip
Traceback (most recent call last):
  File "/root/.pex/venvs/75e1762e0292c94b683f75ebf8977148b3c6943e/5fd7049af63e03f347278c89401424cd9731df9a/pex", line 274, in <module>
    runpy.run_module(module_name, run_name="__main__", alter_sys=True)
  File "/usr/local/lib/python3.8/runpy.py", line 207, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File "/usr/local/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/.pex/venvs/75e1762e0292c94b683f75ebf8977148b3c6943e/5fd7049af63e03f347278c89401424cd9731df9a/lib/python3.8/site-packages/main/whisper/main.py", line 1, in <module>
    import whisper
  File "/root/.pex/venvs/75e1762e0292c94b683f75ebf8977148b3c6943e/5fd7049af63e03f347278c89401424cd9731df9a/lib/python3.8/site-packages/whisper/__init__.py", line 8, in <module>
    import torch
  File "/root/.pex/venvs/75e1762e0292c94b683f75ebf8977148b3c6943e/5fd7049af63e03f347278c89401424cd9731df9a/lib/python3.8/site-packages/torch/__init__.py", line 229, in <module>
    from torch._C import *  # noqa: F403
ImportError: libtorch_cuda.so: cannot open shared object file: No such file or directory
So ... your easiest option is to use the
pex_binary
support for Pex's packed layout which packages the PEX in a special directory-based format instead of in a zip file. You get this via `layout="packed"`: https://www.pantsbuild.org/docs/reference-pex_binary#codelayoutcode When I add that
layout="packed"
to your example repo
pex_binary
target and update the Dockerfile entrypoint to:
Copy code
ENTRYPOINT ["/usr/local/bin/python3.8", "/bin/whisper"]
I get:
Copy code
$ time docker run --rm -it whisper_docker:latest
/root/.pex/installed_wheels/0d1004abc525c92a0e0befc850db2ffe4b4f80e9eb8875b1459d5a3a270880be/openai_whisper-20230314-py3-none-any.whl/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See <https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit> for details.
  def backtrace(trace: np.ndarray):
100%|███████████████████████████████████████| 139M/139M [00:13<00:00, 10.6MiB/s]
Whisper(
  (encoder): AudioEncoder(
    (conv1): Conv1d(80, 512, kernel_size=(3,), stride=(1,), padding=(1,))
    (conv2): Conv1d(512, 512, kernel_size=(3,), stride=(2,), padding=(1,))
    (blocks): ModuleList(
      (0-5): 6 x ResidualAttentionBlock(
        (attn): MultiHeadAttention(
          (query): Linear(in_features=512, out_features=512, bias=True)
          (key): Linear(in_features=512, out_features=512, bias=False)
          (value): Linear(in_features=512, out_features=512, bias=True)
          (out): Linear(in_features=512, out_features=512, bias=True)
        )
        (attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (0): Linear(in_features=512, out_features=2048, bias=True)
          (1): GELU(approximate='none')
          (2): Linear(in_features=2048, out_features=512, bias=True)
        )
        (mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
      )
    )
    (ln_post): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
  )
  (decoder): TextDecoder(
    (token_embedding): Embedding(51865, 512)
    (blocks): ModuleList(
      (0-5): 6 x ResidualAttentionBlock(
        (attn): MultiHeadAttention(
          (query): Linear(in_features=512, out_features=512, bias=True)
          (key): Linear(in_features=512, out_features=512, bias=False)
          (value): Linear(in_features=512, out_features=512, bias=True)
          (out): Linear(in_features=512, out_features=512, bias=True)
        )
        (attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (cross_attn): MultiHeadAttention(
          (query): Linear(in_features=512, out_features=512, bias=True)
          (key): Linear(in_features=512, out_features=512, bias=False)
          (value): Linear(in_features=512, out_features=512, bias=True)
          (out): Linear(in_features=512, out_features=512, bias=True)
        )
        (cross_attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (0): Linear(in_features=512, out_features=2048, bias=True)
          (1): GELU(approximate='none')
          (2): Linear(in_features=2048, out_features=512, bias=True)
        )
        (mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
      )
    )
    (ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
  )
)

real    0m41.540s
user    0m0.010s
sys     0m0.020s
So I think that solves all the issues The startup time is horrendous though and need not be (at the expense of some extra docker image build time). To move the startup overhead into docker image build overhead, also add
execution_mode="venv"
to your
pex_binary
target and change the Dockerfile to be like so:
Copy code
FROM python:3.8.17-slim-bullseye

COPY src.python.main.whisper/main.pex /tmp/main.pex
RUN \
    PEX_TOOLS=1 /usr/local/bin/python3.8 /tmp/main.pex venv \
        --remove all \
        --compile \
        --bin-path prepend \
        /bin/whisper

ENTRYPOINT ["/bin/whisper/pex"]
Then you get:
Copy code
$ time docker run --rm -it whisper_docker:latest
/bin/whisper/lib/python3.8/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See <https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit> for details.
  def backtrace(trace: np.ndarray):
100%|███████████████████████████████████████| 139M/139M [00:10<00:00, 14.3MiB/s]
Whisper(
  (encoder): AudioEncoder(
    (conv1): Conv1d(80, 512, kernel_size=(3,), stride=(1,), padding=(1,))
    (conv2): Conv1d(512, 512, kernel_size=(3,), stride=(2,), padding=(1,))
    (blocks): ModuleList(
      (0-5): 6 x ResidualAttentionBlock(
        (attn): MultiHeadAttention(
          (query): Linear(in_features=512, out_features=512, bias=True)
          (key): Linear(in_features=512, out_features=512, bias=False)
          (value): Linear(in_features=512, out_features=512, bias=True)
          (out): Linear(in_features=512, out_features=512, bias=True)
        )
        (attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (0): Linear(in_features=512, out_features=2048, bias=True)
          (1): GELU(approximate='none')
          (2): Linear(in_features=2048, out_features=512, bias=True)
        )
        (mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
      )
    )
    (ln_post): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
  )
  (decoder): TextDecoder(
    (token_embedding): Embedding(51865, 512)
    (blocks): ModuleList(
      (0-5): 6 x ResidualAttentionBlock(
        (attn): MultiHeadAttention(
          (query): Linear(in_features=512, out_features=512, bias=True)
          (key): Linear(in_features=512, out_features=512, bias=False)
          (value): Linear(in_features=512, out_features=512, bias=True)
          (out): Linear(in_features=512, out_features=512, bias=True)
        )
        (attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (cross_attn): MultiHeadAttention(
          (query): Linear(in_features=512, out_features=512, bias=True)
          (key): Linear(in_features=512, out_features=512, bias=False)
          (value): Linear(in_features=512, out_features=512, bias=True)
          (out): Linear(in_features=512, out_features=512, bias=True)
        )
        (cross_attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (0): Linear(in_features=512, out_features=2048, bias=True)
          (1): GELU(approximate='none')
          (2): Linear(in_features=2048, out_features=512, bias=True)
        )
        (mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
      )
    )
    (ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
  )
)

real    0m12.823s
user    0m0.000s
sys     0m0.028s
Still pretty bad buit better.
Ok, @bland-father-19717 and @curved-television-6568 I've updated the issue with this worked explanation: https://github.com/pantsbuild/pants/issues/19505#issuecomment-1724181440
🙏 2
👍 1
And, FWIW, here is the CPython tracking issue for zipimporter vs zip64: https://bugs.python.org/issue32959 (https://github.com/python/cpython/issues/77140)
Actually, this looks like the best tracking issue for the ongoing work to fix this zipimport issue: https://github.com/python/cpython/pull/94146
And here's a bug to maybe warn or fail fast at PEX zip creation time when the zip64 conditions are met: https://github.com/pantsbuild/pex/issues/2247 This is not easy to get right though fwict and I always favor not enraging an advanced user that knows what they're doing by getting in the way over coddling; so we'll see.
👀 1
c
coddle-mode: Option[bool]=False
e
If that means "definitive nyet", Tsoding approves.
😂 1
c
it hints at a coddling mode, that you can toggle.. 😁
b
@enough-analyst-54434 I wanted to express my deep appreciation for the in-depth analysis, solution, and optimization suggestions you provided on GitHub. After implementing the changes, everything is running smoothly in my environment, and the optimizations have significantly improved performance. Your clear and detailed guidance was invaluable. I’m grateful to be a part of such a knowledgeable and helpful community. Thanks again for all the support!
❤️ 1
e
You're welcome @bland-father-19717. Note that there is likely more optimization you could be doing depending on your development / deploy lifecycle shape, see https://blog.pantsbuild.org/optimizing-python-docker-deploys-using-pants/ for ideas.
🙏 1
👍 1
Ok, and here is a new feature that now warns by default when the generated PEX zipapp is too big (Pants could choose to
--check error
or expose the toggle if it wishes): https://github.com/pantsbuild/pex/pull/2253
b
Excellent! Thanks!