bland-father-19717
09/15/2023, 12:07 PMenough-analyst-54434
09/15/2023, 12:30 PMpkatforms
or complete_platforms
on your pex_binary
target. I'll add more detail to the issue in a few hours about how to do this, but you might read here: https://www.pantsbuild.org/docs/reference-pex_binary#codecomplete_platformscodebland-father-19717
09/15/2023, 12:33 PMplatforms
or complete_platforms
as you suggested. I’ll also go through the documentation link you provided. Looking forward to the additional details on the issue.curved-television-6568
09/15/2023, 1:38 PMbland-father-19717
09/15/2023, 1:43 PMcurved-television-6568
09/15/2023, 1:51 PMenough-analyst-54434
09/15/2023, 1:54 PMcurved-television-6568
09/15/2023, 1:55 PMenough-analyst-54434
09/15/2023, 1:57 PMbland-father-19717
09/15/2023, 2:01 PMenough-analyst-54434
09/15/2023, 2:41 PM(example.venv) jsirois@Gill-Windows:~/support/pants/peachanG $ unzip -qc ~/downloads/torch-2.0.1-cp38-none-macosx_11_0_arm64.whl torch-2.0.1.dist-info/METADATA | grep Requires
Requires-Python: >=3.8.0
Requires-Dist: filelock
Requires-Dist: typing-extensions
Requires-Dist: sympy
Requires-Dist: networkx
Requires-Dist: jinja2
Requires-Dist: opt-einsum (>=3.3) ; extra == 'opt-einsum'
(example.venv) jsirois@Gill-Windows:~/support/pants/peachanG $ unzip -qc ~/downloads/torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl torch-2.0.1.dist-info/METADATA | grep Requires
Requires-Python: >=3.8.0
Requires-Dist: filelock
Requires-Dist: typing-extensions
Requires-Dist: sympy
Requires-Dist: networkx
Requires-Dist: jinja2
Requires-Dist: nvidia-cuda-nvrtc-cu11 (==11.7.99) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cuda-runtime-cu11 (==11.7.99) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cuda-cupti-cu11 (==11.7.101) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cudnn-cu11 (==8.5.0.96) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cublas-cu11 (==11.10.3.66) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cufft-cu11 (==10.9.0.58) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-curand-cu11 (==10.2.10.91) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cusolver-cu11 (==11.4.0.1) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-cusparse-cu11 (==11.7.4.91) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-nccl-cu11 (==2.14.3) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: nvidia-nvtx-cu11 (==11.7.91) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: triton (==2.0.0) ; platform_system == "Linux" and platform_machine == "x86_64"
Requires-Dist: opt-einsum (>=3.3) ; extra == 'opt-einsum'
That differing metadata per-wheel defeats Pex locking utterly. Pex assumes all artifacts for a given project version will contain the same requirement metadata; in other words, it does not download all the available artifacts for a given version (many terabytes worth in the torch case), to get the requirement metadata. It just downloads 1. It looks like your lock downloaded the mac wheel and thus leaves out all the nvidia requirements. As such, you can never build a proper PEX for linux using your lock file.enough-analyst-54434
09/15/2023, 2:43 PMenough-analyst-54434
09/15/2023, 2:44 PMbland-father-19717
09/15/2023, 2:53 PMenough-analyst-54434
09/15/2023, 2:55 PMenough-analyst-54434
09/15/2023, 2:56 PMenough-analyst-54434
09/15/2023, 2:57 PMenough-analyst-54434
09/15/2023, 3:09 PMRequires-Dist: nvidia-cuda-nvrtc-cu11 (==11.7.99) ; platform_system == "Linux" and platform_machine == "x86_64"
into manual dependencies in Pants. Something like:
python_requirement(
name="evil-torch-workaround",
requirements=[
"torch==2.0.1",
'nvidia-cuda-nvrtc-cu11==11.7.99; platform_system == "Linux" and platform_machine == "x86_64"',
...
],
)
Of course, that means to bump torch, you need to go research what the full union of its requirements are using unzip like I did above.bland-father-19717
09/15/2023, 3:16 PMbland-father-19717
09/18/2023, 2:21 PM/usr/local/bin/python3.8: can't find '__main__' module in '/bin/whisper'
Details: https://github.com/pantsbuild/pants/issues/19505#issuecomment-1723514425bland-father-19717
09/18/2023, 2:23 PMpants run src/python/main/whisper/main.py
❌: pants run src/python/main/whisper:main
❌: pants run src/python/main/whisper/Dockerfile
❌: pants run src/python/main/whisper:whisper_docker
enough-analyst-54434
09/18/2023, 6:41 PM__main__.py
inside even though it is there (I had to install zip and unzip inside the image but I omitted these steps below):
$ docker run --rm -it --entrypoint bash whisper_docker:latest
root@f8fa69b3a6f2:/# ls -lrth /bin/whisper
-r-xr-xr-x 1 root root 2.3G Sep 18 17:25 /bin/whisper
root@f8fa69b3a6f2:/# zipinfo /bin/whisper | tail -1
19515 files, 4576079657 bytes uncompressed, 2368282036 bytes compressed: 48.2%
As an experiment I removed 1 ~600MB file from the zip to bring it under 4GB uncompressed:
root@f8fa69b3a6f2:/# cp /bin/whisper /bin/whisper.zip
root@f8fa69b3a6f2:/# zip -d /bin/whisper.zip .deps/torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl/torch/lib/libtorch_cuda.so
deleting: .deps/torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl/torch/lib/libtorch_cuda.so
zip warning: Local Version Needed To Extract does not match CD: .deps/torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl/torch/lib/libtorch_cuda_linalg.so
...
root@f8fa69b3a6f2:/# ls -lrth /bin | grep whisper
-r-xr-xr-x 1 root root 2.3G Sep 18 17:25 whisper
-r-xr-xr-x 1 root root 1.9G Sep 18 17:41 whisper.zip
root@f8fa69b3a6f2:/# zipinfo /bin/whisper.zip | tail -1
19514 files, 3919218968 bytes uncompressed, 1970217874 bytes compressed: 49.7%
That then ~works:
root@f8fa69b3a6f2:/# whisper.zip
Traceback (most recent call last):
File "/root/.pex/venvs/75e1762e0292c94b683f75ebf8977148b3c6943e/5fd7049af63e03f347278c89401424cd9731df9a/pex", line 274, in <module>
runpy.run_module(module_name, run_name="__main__", alter_sys=True)
File "/usr/local/lib/python3.8/runpy.py", line 207, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/local/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/.pex/venvs/75e1762e0292c94b683f75ebf8977148b3c6943e/5fd7049af63e03f347278c89401424cd9731df9a/lib/python3.8/site-packages/main/whisper/main.py", line 1, in <module>
import whisper
File "/root/.pex/venvs/75e1762e0292c94b683f75ebf8977148b3c6943e/5fd7049af63e03f347278c89401424cd9731df9a/lib/python3.8/site-packages/whisper/__init__.py", line 8, in <module>
import torch
File "/root/.pex/venvs/75e1762e0292c94b683f75ebf8977148b3c6943e/5fd7049af63e03f347278c89401424cd9731df9a/lib/python3.8/site-packages/torch/__init__.py", line 229, in <module>
from torch._C import * # noqa: F403
ImportError: libtorch_cuda.so: cannot open shared object file: No such file or directory
So ... your easiest option is to use the pex_binary
support for Pex's packed layout which packages the PEX in a special directory-based format instead of in a zip file. You get this via `layout="packed"`: https://www.pantsbuild.org/docs/reference-pex_binary#codelayoutcode
When I add that layout="packed"
to your example repo pex_binary
target and update the Dockerfile entrypoint to:
ENTRYPOINT ["/usr/local/bin/python3.8", "/bin/whisper"]
I get:
$ time docker run --rm -it whisper_docker:latest
/root/.pex/installed_wheels/0d1004abc525c92a0e0befc850db2ffe4b4f80e9eb8875b1459d5a3a270880be/openai_whisper-20230314-py3-none-any.whl/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See <https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit> for details.
def backtrace(trace: np.ndarray):
100%|███████████████████████████████████████| 139M/139M [00:13<00:00, 10.6MiB/s]
Whisper(
(encoder): AudioEncoder(
(conv1): Conv1d(80, 512, kernel_size=(3,), stride=(1,), padding=(1,))
(conv2): Conv1d(512, 512, kernel_size=(3,), stride=(2,), padding=(1,))
(blocks): ModuleList(
(0-5): 6 x ResidualAttentionBlock(
(attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=512, out_features=2048, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=2048, out_features=512, bias=True)
)
(mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
)
(ln_post): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
(decoder): TextDecoder(
(token_embedding): Embedding(51865, 512)
(blocks): ModuleList(
(0-5): 6 x ResidualAttentionBlock(
(attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(cross_attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(cross_attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=512, out_features=2048, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=2048, out_features=512, bias=True)
)
(mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
)
(ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
)
real 0m41.540s
user 0m0.010s
sys 0m0.020s
So I think that solves all the issues The startup time is horrendous though and need not be (at the expense of some extra docker image build time). To move the startup overhead into docker image build overhead, also add execution_mode="venv"
to your pex_binary
target and change the Dockerfile to be like so:
FROM python:3.8.17-slim-bullseye
COPY src.python.main.whisper/main.pex /tmp/main.pex
RUN \
PEX_TOOLS=1 /usr/local/bin/python3.8 /tmp/main.pex venv \
--remove all \
--compile \
--bin-path prepend \
/bin/whisper
ENTRYPOINT ["/bin/whisper/pex"]
Then you get:
$ time docker run --rm -it whisper_docker:latest
/bin/whisper/lib/python3.8/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See <https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit> for details.
def backtrace(trace: np.ndarray):
100%|███████████████████████████████████████| 139M/139M [00:10<00:00, 14.3MiB/s]
Whisper(
(encoder): AudioEncoder(
(conv1): Conv1d(80, 512, kernel_size=(3,), stride=(1,), padding=(1,))
(conv2): Conv1d(512, 512, kernel_size=(3,), stride=(2,), padding=(1,))
(blocks): ModuleList(
(0-5): 6 x ResidualAttentionBlock(
(attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=512, out_features=2048, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=2048, out_features=512, bias=True)
)
(mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
)
(ln_post): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
(decoder): TextDecoder(
(token_embedding): Embedding(51865, 512)
(blocks): ModuleList(
(0-5): 6 x ResidualAttentionBlock(
(attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(cross_attn): MultiHeadAttention(
(query): Linear(in_features=512, out_features=512, bias=True)
(key): Linear(in_features=512, out_features=512, bias=False)
(value): Linear(in_features=512, out_features=512, bias=True)
(out): Linear(in_features=512, out_features=512, bias=True)
)
(cross_attn_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=512, out_features=2048, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=2048, out_features=512, bias=True)
)
(mlp_ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
)
(ln): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
)
real 0m12.823s
user 0m0.000s
sys 0m0.028s
Still pretty bad buit better.enough-analyst-54434
09/18/2023, 6:48 PMenough-analyst-54434
09/18/2023, 6:56 PMenough-analyst-54434
09/18/2023, 6:58 PMenough-analyst-54434
09/18/2023, 11:31 PMcurved-television-6568
09/18/2023, 11:45 PMcoddle-mode: Option[bool]=False
enough-analyst-54434
09/18/2023, 11:54 PMcurved-television-6568
09/19/2023, 1:11 AMbland-father-19717
09/19/2023, 1:11 AMenough-analyst-54434
09/19/2023, 2:10 AMenough-analyst-54434
09/30/2023, 2:00 AM--check error
or expose the toggle if it wishes): https://github.com/pantsbuild/pex/pull/2253bland-father-19717
09/30/2023, 2:07 AM