steep-controller-23208
08/11/2022, 1:38 PMBuilding ${n} requirements for ${pexfile} from the lockfiles/python-default.lock
, even though the lock does not change.
Is this required process for building pex or is there any way to speeding up this process?
Thank you in advance.happy-kitchen-89482
08/11/2022, 1:40 PMsteep-controller-23208
08/11/2022, 1:41 PMhappy-kitchen-89482
08/11/2022, 1:52 PMhundreds-father-404
08/11/2022, 1:58 PMThis shouldn’t be happening, the results of the requirements build should be cached locally.Well, sort of. As explained at https://www.pantsbuild.org/docs/python-third-party-dependencies, Pants uses the precise subset of the lockfile you need for a particular task, which gives benefits like fine-grained caching Assuming you're using a Pex-generated lockfile (the default in 2.12+), then we will install the subset we need. It should be cached the download step, and you already at locktime-creation time paid the cost of "resolution" like figuring out what versions to use. But it's still not instant This is controlled by this option https://www.pantsbuild.org/docs/reference-python#section-run-against-entire-lockfile
steep-controller-23208
08/11/2022, 1:58 PMprint(1)
) , it happens.
But I expect to pants to cache third-parties and skip this process, because I did not change them.hundreds-father-404
08/11/2022, 2:00 PMWhen I made changes on my code like (print(1)) , it happens.That is definitely not expected. Unless you added some new third-party requirement to the file, where the subset of deps has not been encountered before
where the subset of deps has not been encountered beforeThe same "requirements.pex" gets shared across many builds. And even when it's not, the downloading of the dependencies should still be shared across every build, such that creating a new requirements.pex is faster
happy-kitchen-89482
08/11/2022, 2:10 PMsteep-controller-23208
08/11/2022, 2:14 PMAssuming you're using a Pex-generated lockfile (the default in 2.12+), then we will install the subset we need. It should be cached the download step, and you already at locktime-creation time paid the cost of "resolution" like figuring out what versions to use. But it's still not instantYes, I using pex-generated lockfile, and I found out the third-parties are downloaded on my machine, because build steps success on offline environment,
print(1)
, then find the building process happen every time.
./pants package project2/project2:main
(of course, exactly same code build once, caches works expectedly and the process finish instantly)enough-analyst-54434
08/11/2022, 3:58 PM$ time ./pants --no-process-cleanup package project2/project2:main
08:55:46.97 [INFO] Preserving local process execution dir /tmp/process-executionMoTCqo for "Determine Python dependencies for project2/project2/main.py"
08:55:46.99 [INFO] Preserving local process execution dir /tmp/process-executionALkzsU for "Building 4 requirements for project2.project2/project2_main-main.pex from the lockfiles/python-default.lock resolve: boto3-stubs<2.0.0,>=1.24.49, boto3<2.0.0,>=1.24.49, lightgbm<4.0.0,>=3.3.2, pandas<2.0.0,>=1.4.3"
08:55:59.13 [INFO] Completed: Building 4 requirements for project2.project2/project2_main-main.pex from the lockfiles/python-default.lock resolve: boto3-stubs<2.0.0,>=1.24.49, boto3<2.0.0,>=1.24.49, lightgbm<4.0.0,>=3.3.2, pandas<... (13 characters truncated)
08:55:59.17 [INFO] Wrote dist/project2.project2/project2_main-main.pex
real 0m12.716s
user 0m0.406s
sys 0m0.053s
Running the PEX creation process directly:
$ time /tmp/process-executionALkzsU/__run.sh
real 0m11.918s
user 0m11.644s
sys 0m0.270s
So there is ~800ms of Pants overhead there.
Breaking down the PEX generation timings shows:
$ time PEX_VERBOSE=1 /tmp/process-executionALkzsU/__run.sh
...
pex: Laying out PEX zipfile local_dists.pex: 0.1ms
pex: Resolving distributions (boto3-stubs<2.0.0,>=1.24.49 boto3<2.0.0,>=1.24.49 lightgbm<4.0.0,>=3.3.2 pandas<2.0.0,>=1.4.3): 1308.2ms
pex: Parsing lock lockfiles/python-default.lock: 441.2ms
pex: Resolving requirements from lock file lockfiles/python-default.lock: 866.5ms
pex: Parsing requirements: 1.6ms
pex: Resolving urls to fetch for 4 requirements from lock lockfiles/python-default.lock: 19.3ms
pex: Downloading 21 distributions to satisfy 4 requirements: 7.2ms
pex: Categorizing 21 downloaded artifacts: 0.1ms
pex: Building 0 artifacts and installing 21: 831.3ms
pex: Calculating project names for direct requirements:
PyPIRequirement(line=LogicalLine(raw_text='boto3-stubs<2.0.0,>=1.24.49', processed_text='boto3-stubs<2.0.0,>=1.24.49', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='boto3-stubs', url=None, extras=frozenset(), specifier=<SpecifierSet('<2.0.0,>=1.24.49')>, marker=None), editable=False)
PyPIRequirement(line=LogicalLine(raw_text='boto3<2.0.0,>=1.24.49', processed_text='boto3<2.0.0,>=1.24.49', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='boto3', url=None, extras=frozenset(), specifier=<SpecifierSet('<2.0.0,>=1.24.49')>, marker=None), editable=False)
PyPIRequirement(line=LogicalLine(raw_text='lightgbm<4.0.0,>=3.3.2', processed_text='lightgbm<4.0.0,>=3.3.2', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='lightgbm', url=None, extras=frozenset(), specifier=<SpecifierSet('<4.0.0,>=3.3.2')>, marker=None), editable=False)
PyPIRequirement(line=LogicalLine(raw_text='pandas<2.0.0,>=1.4.3', processed_text='pandas<2.0.0,>=1.4.3', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='pandas', url=None, extras=frozenset(), specifier=<SpecifierSet('<2.0.0,>=1.4.3')>, marker=None), editable=False): 0.0ms
pex: Installing 21 distributions: 830.1ms
pex: Zipping PEX file.: 10239.8ms
real 0m11.999s
user 0m11.784s
sys 0m0.210s
So almost all the time is spent zipping.__run.sh
to use PEX's --no-compress
feature:
$ diff /tmp/process-executionALkzsU/__run.sh.orig /tmp/process-executionALkzsU/__run.sh
5c5
< /home/jsirois/.pyenv/versions/3.10.6/bin/python ./pex --tmpdir .tmp --jobs 4 --python-path $'/home/jsirois/.pyenv/versions/2.7.18/bin:/home/jsirois/.pyenv/versions/3.10.6/bin:/home/jsirois/.pyenv/versions/3.11.0b5/bin:/home/jsirois/.pyenv/versions/3.6.15/bin:/home/jsirois/.pyenv/versions/3.7.13/bin:/home/jsirois/.pyenv/versions/3.8.13/bin:/home/jsirois/.pyenv/versions/3.9.13/bin:/home/jsirois/.pyenv/versions/pypy2.7-7.3.9/bin:/home/jsirois/.pyenv/versions/pypy3.6-7.3.3/bin:/home/jsirois/.pyenv/versions/pypy3.7-7.3.9/bin:/home/jsirois/.pyenv/versions/pypy3.8-7.3.9/bin:/home/jsirois/.pyenv/versions/pypy3.9-7.3.9/bin:/home/jsirois/.pyenv/shims:/home/jsirois/Downloads/google-cloud-sdk/bin:/home/jsirois/.cargo/bin:/home/jsirois/.pyenv/bin:/home/jsirois/bin:/home/jsirois/.local/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/var/lib/flatpak/exports/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/usr/lib/jvm/default/bin:/home/jsirois/go/bin' --output-file project2.project2/project2_main-main.pex --no-emit-warnings --manylinux manylinux2014 --requirements-pex local_dists.pex --interpreter-constraint $'CPython==3.8.*' --entry-point $'project2.main:main' $'--sources-directory=source_files' $'boto3-stubs<2.0.0,>=1.24.49' $'boto3<2.0.0,>=1.24.49' $'lightgbm<4.0.0,>=3.3.2' $'pandas<2.0.0,>=1.4.3' --lock lockfiles/python-default.lock --no-pypi $'--index=<https://pypi.org/simple/>' --layout zipapp
---
> /home/jsirois/.pyenv/versions/3.10.6/bin/python ./pex --tmpdir .tmp --jobs 4 --python-path $'/home/jsirois/.pyenv/versions/2.7.18/bin:/home/jsirois/.pyenv/versions/3.10.6/bin:/home/jsirois/.pyenv/versions/3.11.0b5/bin:/home/jsirois/.pyenv/versions/3.6.15/bin:/home/jsirois/.pyenv/versions/3.7.13/bin:/home/jsirois/.pyenv/versions/3.8.13/bin:/home/jsirois/.pyenv/versions/3.9.13/bin:/home/jsirois/.pyenv/versions/pypy2.7-7.3.9/bin:/home/jsirois/.pyenv/versions/pypy3.6-7.3.3/bin:/home/jsirois/.pyenv/versions/pypy3.7-7.3.9/bin:/home/jsirois/.pyenv/versions/pypy3.8-7.3.9/bin:/home/jsirois/.pyenv/versions/pypy3.9-7.3.9/bin:/home/jsirois/.pyenv/shims:/home/jsirois/Downloads/google-cloud-sdk/bin:/home/jsirois/.cargo/bin:/home/jsirois/.pyenv/bin:/home/jsirois/bin:/home/jsirois/.local/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/var/lib/flatpak/exports/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/usr/lib/jvm/default/bin:/home/jsirois/go/bin' --output-file project2.project2/project2_main-main.pex --no-emit-warnings --manylinux manylinux2014 --requirements-pex local_dists.pex --interpreter-constraint $'CPython==3.8.*' --entry-point $'project2.main:main' $'--sources-directory=source_files' $'boto3-stubs<2.0.0,>=1.24.49' $'boto3<2.0.0,>=1.24.49' $'lightgbm<4.0.0,>=3.3.2' $'pandas<2.0.0,>=1.4.3' --lock lockfiles/python-default.lock --no-pypi $'--index=<https://pypi.org/simple/>' --layout zipapp --no-compress
I find:
$ time PEX_VERBOSE=1 /tmp/process-executionALkzsU/__run.sh
...
pex: Laying out PEX zipfile local_dists.pex: 0.1ms
pex: Resolving distributions (boto3-stubs<2.0.0,>=1.24.49 boto3<2.0.0,>=1.24.49 lightgbm<4.0.0,>=3.3.2 pandas<2.0.0,>=1.4.3): 1261.2ms
pex: Parsing lock lockfiles/python-default.lock: 442.8ms
pex: Resolving requirements from lock file lockfiles/python-default.lock: 817.9ms
pex: Parsing requirements: 1.6ms
pex: Resolving urls to fetch for 4 requirements from lock lockfiles/python-default.lock: 18.3ms
pex: Downloading 21 distributions to satisfy 4 requirements: 7.1ms
pex: Categorizing 21 downloaded artifacts: 0.1ms
pex: Building 0 artifacts and installing 21: 784.4ms
pex: Calculating project names for direct requirements:
PyPIRequirement(line=LogicalLine(raw_text='boto3-stubs<2.0.0,>=1.24.49', processed_text='boto3-stubs<2.0.0,>=1.24.49', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='boto3-stubs', url=None, extras=frozenset(), specifier=<SpecifierSet('<2.0.0,>=1.24.49')>, marker=None), editable=False)
PyPIRequirement(line=LogicalLine(raw_text='boto3<2.0.0,>=1.24.49', processed_text='boto3<2.0.0,>=1.24.49', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='boto3', url=None, extras=frozenset(), specifier=<SpecifierSet('<2.0.0,>=1.24.49')>, marker=None), editable=False)
PyPIRequirement(line=LogicalLine(raw_text='lightgbm<4.0.0,>=3.3.2', processed_text='lightgbm<4.0.0,>=3.3.2', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='lightgbm', url=None, extras=frozenset(), specifier=<SpecifierSet('<4.0.0,>=3.3.2')>, marker=None), editable=False)
PyPIRequirement(line=LogicalLine(raw_text='pandas<2.0.0,>=1.4.3', processed_text='pandas<2.0.0,>=1.4.3', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='pandas', url=None, extras=frozenset(), specifier=<SpecifierSet('<2.0.0,>=1.4.3')>, marker=None), editable=False): 0.0ms
pex: Installing 21 distributions: 783.1ms
pex: Zipping PEX file.: 685.3ms
real 0m2.392s
user 0m2.083s
sys 0m0.310s
pex_binary
that allows you to turn compression off however.steep-controller-23208
08/11/2022, 4:04 PM--no-compress
option, Thank you!~🙏enough-analyst-54434
08/11/2022, 4:05 PMhundreds-father-404
08/11/2022, 4:05 PMwhole new zip needs to be created and most of the latency you observe is, in fact, time spent zipping.Ah ha, and this is because you're running
package
which includes the source files in the PEX. This is not the case when running other goals like test
and run
enough-analyst-54434
08/11/2022, 4:07 PMhundreds-father-404
08/11/2022, 4:12 PMrun
a python_source
using the cheat where we don't include the source file in the PEX. But if you run on a pex_binary
, it is the same as first running package
, then running expicitly `dist/my_app.pex`; no cheatenough-analyst-54434
08/11/2022, 4:13 PMbitter-ability-32190
08/11/2022, 4:14 PMsteep-controller-23208
08/11/2022, 4:16 PMtest
locally, but tests for ML application is hard😩)enough-analyst-54434
08/11/2022, 11:08 PMpex_binary(layout="packed", ...)
and rsync the PEX over ssh (it will now be a directory and not a single zip) to the kubernetes machine.steep-controller-23208
08/12/2022, 12:50 AM