Hi, I wonder why pants always takes a time while b...
# general
s
Hi, I wonder why pants always takes a time while building third-party packages with a message
Building ${n} requirements for ${pexfile} from the lockfiles/python-default.lock
, even though the lock does not change. Is this required process for building pex or is there any way to speeding up this process? Thank you in advance.
h
Hi! Is this on desktop or in CI?
s
Thank you for your response. This is on my desktop.
And the process takes about 30 secs. This is not too slow, but my current building pipeline (creating a base docker image with third party packages) is still faster than pants build. So I cannot decide to use pants.
h
This shouldn’t be happening, the results of the requirements build should be cached locally.
So it does that action every time you run it?
h
This shouldn’t be happening, the results of the requirements build should be cached locally.
Well, sort of. As explained at https://www.pantsbuild.org/docs/python-third-party-dependencies, Pants uses the precise subset of the lockfile you need for a particular task, which gives benefits like fine-grained caching Assuming you're using a Pex-generated lockfile (the default in 2.12+), then we will install the subset we need. It should be cached the download step, and you already at locktime-creation time paid the cost of "resolution" like figuring out what versions to use. But it's still not instant This is controlled by this option https://www.pantsbuild.org/docs/reference-python#section-run-against-entire-lockfile
👀 1
s
precisely, no. When I made changes on my code like (
print(1)
) , it happens. But I expect to pants to cache third-parties and skip this process, because I did not change them.
h
When I made changes on my code like (print(1)) , it happens.
That is definitely not expected. Unless you added some new third-party requirement to the file, where the subset of deps has not been encountered before
where the subset of deps has not been encountered before
The same "requirements.pex" gets shared across many builds. And even when it's not, the downloading of the dependencies should still be shared across every build, such that creating a new requirements.pex is faster
h
@steep-controller-23208 are you able to post an example repo on github that demonstrates the issue?
Using your real requirements and lockfile if possible, but not all your real internal code of course…
s
Assuming you're using a Pex-generated lockfile (the default in 2.12+), then we will install the subset we need. It should be cached the download step, and you already at locktime-creation time paid the cost of "resolution" like figuring out what versions to use. But it's still not instant
Yes, I using pex-generated lockfile, and I found out the third-parties are downloaded on my machine, because build steps success on offline environment,
Ok! I made the repo, please wait a while.
Thank you again!
This is a sample repo for this issue. You can comment out/in this line or add any changes on this file like adding
print(1)
, then find the building process happen every time.
Copy code
./pants package project2/project2:main
(of course, exactly same code build once, caches works expectedly and the process finish instantly)
👀 1
e
So, on my machine:
Copy code
$ time ./pants --no-process-cleanup package project2/project2:main
08:55:46.97 [INFO] Preserving local process execution dir /tmp/process-executionMoTCqo for "Determine Python dependencies for project2/project2/main.py"
08:55:46.99 [INFO] Preserving local process execution dir /tmp/process-executionALkzsU for "Building 4 requirements for project2.project2/project2_main-main.pex from the lockfiles/python-default.lock resolve: boto3-stubs<2.0.0,>=1.24.49, boto3<2.0.0,>=1.24.49, lightgbm<4.0.0,>=3.3.2, pandas<2.0.0,>=1.4.3"
08:55:59.13 [INFO] Completed: Building 4 requirements for project2.project2/project2_main-main.pex from the lockfiles/python-default.lock resolve: boto3-stubs<2.0.0,>=1.24.49, boto3<2.0.0,>=1.24.49, lightgbm<4.0.0,>=3.3.2, pandas<... (13 characters truncated)
08:55:59.17 [INFO] Wrote dist/project2.project2/project2_main-main.pex

real    0m12.716s
user    0m0.406s
sys     0m0.053s
Running the PEX creation process directly:
Copy code
$ time /tmp/process-executionALkzsU/__run.sh 

real    0m11.918s
user    0m11.644s
sys     0m0.270s
So there is ~800ms of Pants overhead there. Breaking down the PEX generation timings shows:
Copy code
$ time PEX_VERBOSE=1 /tmp/process-executionALkzsU/__run.sh
...
pex:   Laying out PEX zipfile local_dists.pex: 0.1ms
pex:   Resolving distributions (boto3-stubs<2.0.0,>=1.24.49 boto3<2.0.0,>=1.24.49 lightgbm<4.0.0,>=3.3.2 pandas<2.0.0,>=1.4.3): 1308.2ms
pex:     Parsing lock lockfiles/python-default.lock: 441.2ms
pex:     Resolving requirements from lock file lockfiles/python-default.lock: 866.5ms
pex:       Parsing requirements: 1.6ms
pex:       Resolving urls to fetch for 4 requirements from lock lockfiles/python-default.lock: 19.3ms
pex:       Downloading 21 distributions to satisfy 4 requirements: 7.2ms
pex:       Categorizing 21 downloaded artifacts: 0.1ms
pex:       Building 0 artifacts and installing 21: 831.3ms
pex:         Calculating project names for direct requirements:
  PyPIRequirement(line=LogicalLine(raw_text='boto3-stubs<2.0.0,>=1.24.49', processed_text='boto3-stubs<2.0.0,>=1.24.49', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='boto3-stubs', url=None, extras=frozenset(), specifier=<SpecifierSet('<2.0.0,>=1.24.49')>, marker=None), editable=False)
  PyPIRequirement(line=LogicalLine(raw_text='boto3<2.0.0,>=1.24.49', processed_text='boto3<2.0.0,>=1.24.49', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='boto3', url=None, extras=frozenset(), specifier=<SpecifierSet('<2.0.0,>=1.24.49')>, marker=None), editable=False)
  PyPIRequirement(line=LogicalLine(raw_text='lightgbm<4.0.0,>=3.3.2', processed_text='lightgbm<4.0.0,>=3.3.2', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='lightgbm', url=None, extras=frozenset(), specifier=<SpecifierSet('<4.0.0,>=3.3.2')>, marker=None), editable=False)
  PyPIRequirement(line=LogicalLine(raw_text='pandas<2.0.0,>=1.4.3', processed_text='pandas<2.0.0,>=1.4.3', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='pandas', url=None, extras=frozenset(), specifier=<SpecifierSet('<2.0.0,>=1.4.3')>, marker=None), editable=False): 0.0ms
pex:         Installing 21 distributions: 830.1ms
pex: Zipping PEX file.: 10239.8ms

real	0m11.999s
user	0m11.784s
sys	0m0.210s
So almost all the time is spent zipping.
If I modify the
__run.sh
to use PEX's
--no-compress
feature:
Copy code
$ diff /tmp/process-executionALkzsU/__run.sh.orig /tmp/process-executionALkzsU/__run.sh
5c5
< /home/jsirois/.pyenv/versions/3.10.6/bin/python ./pex --tmpdir .tmp --jobs 4 --python-path $'/home/jsirois/.pyenv/versions/2.7.18/bin:/home/jsirois/.pyenv/versions/3.10.6/bin:/home/jsirois/.pyenv/versions/3.11.0b5/bin:/home/jsirois/.pyenv/versions/3.6.15/bin:/home/jsirois/.pyenv/versions/3.7.13/bin:/home/jsirois/.pyenv/versions/3.8.13/bin:/home/jsirois/.pyenv/versions/3.9.13/bin:/home/jsirois/.pyenv/versions/pypy2.7-7.3.9/bin:/home/jsirois/.pyenv/versions/pypy3.6-7.3.3/bin:/home/jsirois/.pyenv/versions/pypy3.7-7.3.9/bin:/home/jsirois/.pyenv/versions/pypy3.8-7.3.9/bin:/home/jsirois/.pyenv/versions/pypy3.9-7.3.9/bin:/home/jsirois/.pyenv/shims:/home/jsirois/Downloads/google-cloud-sdk/bin:/home/jsirois/.cargo/bin:/home/jsirois/.pyenv/bin:/home/jsirois/bin:/home/jsirois/.local/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/var/lib/flatpak/exports/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/usr/lib/jvm/default/bin:/home/jsirois/go/bin' --output-file project2.project2/project2_main-main.pex --no-emit-warnings --manylinux manylinux2014 --requirements-pex local_dists.pex --interpreter-constraint $'CPython==3.8.*' --entry-point $'project2.main:main' $'--sources-directory=source_files' $'boto3-stubs<2.0.0,>=1.24.49' $'boto3<2.0.0,>=1.24.49' $'lightgbm<4.0.0,>=3.3.2' $'pandas<2.0.0,>=1.4.3' --lock lockfiles/python-default.lock --no-pypi $'--index=<https://pypi.org/simple/>' --layout zipapp
---
> /home/jsirois/.pyenv/versions/3.10.6/bin/python ./pex --tmpdir .tmp --jobs 4 --python-path $'/home/jsirois/.pyenv/versions/2.7.18/bin:/home/jsirois/.pyenv/versions/3.10.6/bin:/home/jsirois/.pyenv/versions/3.11.0b5/bin:/home/jsirois/.pyenv/versions/3.6.15/bin:/home/jsirois/.pyenv/versions/3.7.13/bin:/home/jsirois/.pyenv/versions/3.8.13/bin:/home/jsirois/.pyenv/versions/3.9.13/bin:/home/jsirois/.pyenv/versions/pypy2.7-7.3.9/bin:/home/jsirois/.pyenv/versions/pypy3.6-7.3.3/bin:/home/jsirois/.pyenv/versions/pypy3.7-7.3.9/bin:/home/jsirois/.pyenv/versions/pypy3.8-7.3.9/bin:/home/jsirois/.pyenv/versions/pypy3.9-7.3.9/bin:/home/jsirois/.pyenv/shims:/home/jsirois/Downloads/google-cloud-sdk/bin:/home/jsirois/.cargo/bin:/home/jsirois/.pyenv/bin:/home/jsirois/bin:/home/jsirois/.local/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/var/lib/flatpak/exports/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/usr/lib/jvm/default/bin:/home/jsirois/go/bin' --output-file project2.project2/project2_main-main.pex --no-emit-warnings --manylinux manylinux2014 --requirements-pex local_dists.pex --interpreter-constraint $'CPython==3.8.*' --entry-point $'project2.main:main' $'--sources-directory=source_files' $'boto3-stubs<2.0.0,>=1.24.49' $'boto3<2.0.0,>=1.24.49' $'lightgbm<4.0.0,>=3.3.2' $'pandas<2.0.0,>=1.4.3' --lock lockfiles/python-default.lock --no-pypi $'--index=<https://pypi.org/simple/>' --layout zipapp --no-compress
I find:
Copy code
$ time PEX_VERBOSE=1 /tmp/process-executionALkzsU/__run.sh
...
pex:   Laying out PEX zipfile local_dists.pex: 0.1ms
pex:   Resolving distributions (boto3-stubs<2.0.0,>=1.24.49 boto3<2.0.0,>=1.24.49 lightgbm<4.0.0,>=3.3.2 pandas<2.0.0,>=1.4.3): 1261.2ms
pex:     Parsing lock lockfiles/python-default.lock: 442.8ms
pex:     Resolving requirements from lock file lockfiles/python-default.lock: 817.9ms
pex:       Parsing requirements: 1.6ms
pex:       Resolving urls to fetch for 4 requirements from lock lockfiles/python-default.lock: 18.3ms
pex:       Downloading 21 distributions to satisfy 4 requirements: 7.1ms
pex:       Categorizing 21 downloaded artifacts: 0.1ms
pex:       Building 0 artifacts and installing 21: 784.4ms
pex:         Calculating project names for direct requirements:
  PyPIRequirement(line=LogicalLine(raw_text='boto3-stubs<2.0.0,>=1.24.49', processed_text='boto3-stubs<2.0.0,>=1.24.49', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='boto3-stubs', url=None, extras=frozenset(), specifier=<SpecifierSet('<2.0.0,>=1.24.49')>, marker=None), editable=False)
  PyPIRequirement(line=LogicalLine(raw_text='boto3<2.0.0,>=1.24.49', processed_text='boto3<2.0.0,>=1.24.49', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='boto3', url=None, extras=frozenset(), specifier=<SpecifierSet('<2.0.0,>=1.24.49')>, marker=None), editable=False)
  PyPIRequirement(line=LogicalLine(raw_text='lightgbm<4.0.0,>=3.3.2', processed_text='lightgbm<4.0.0,>=3.3.2', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='lightgbm', url=None, extras=frozenset(), specifier=<SpecifierSet('<4.0.0,>=3.3.2')>, marker=None), editable=False)
  PyPIRequirement(line=LogicalLine(raw_text='pandas<2.0.0,>=1.4.3', processed_text='pandas<2.0.0,>=1.4.3', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='pandas', url=None, extras=frozenset(), specifier=<SpecifierSet('<2.0.0,>=1.4.3')>, marker=None), editable=False): 0.0ms
pex:         Installing 21 distributions: 783.1ms
pex: Zipping PEX file.: 685.3ms

real	0m2.392s
user	0m2.083s
sys	0m0.310s
Pants does not yet expose an option on
pex_binary
that allows you to turn compression off however.
s
I see, so that means huge dependencies makes zipping heavy? ~I’ll try
--no-compress
option, Thank you!~🙏
e
Thanks @steep-controller-23208 for the sample repository - that made this very easy to drill into. Hopefully the above answers your question - basically, even though you changed just 1 source file, a whole new zip needs to be created and most of the latency you observe is, in fact, time spent zipping. That really can't be improved much except by not compressing when zipping. There are obviously tradeoffs there since the PEX becomes bigger.
1
Yes, huge dependencies are huge, so zipping (with default compression) is slow.
h
whole new zip needs to be created and most of the latency you observe is, in fact, time spent zipping.
Ah ha, and this is because you're running
package
which includes the source files in the PEX. This is not the case when running other goals like
test
and
run
e
Yeah, basically test and run cheat.
1
They do not test the exact production product.
I talked about this with Stu, but its my contention these cheats, though very valuable to end users, are very expensive to the codebase and supporting it. It probably makes sesne to maintain a list of cheats and why we cheat. If ever we could eliminate cheats from the list due to technolocy changes, etc - they would I think almost always represent very big wins for end users and maintainers.
h
@bitter-ability-32190 fixed it so `run`'s cheating is less offensive. Now you can
run
a
python_source
using the cheat where we don't include the source file in the PEX. But if you run on a
pex_binary
, it is the same as first running
package
, then running expicitly `dist/my_app.pex`; no cheat
e
Yeah, behavior cheats are very bad and that was very good to fix. Things should behave the same as production always. The speed cheats are the tough ones though.
2
b
To me, it's not even a cheat. PEX in that case is the powerhouse of how your script is executed in an environment with dependencies (just like when we lint or test).
s
Got it. My final goal is to develop on remote environment such as kubernetes. So I check time of building process because I wanna check application quickly on remote, (I should write more unit tests and and
test
locally, but tests for ML application is hard😩)
e
If you really want to hack that way, use
pex_binary(layout="packed", ...)
and rsync the PEX over ssh (it will now be a directory and not a single zip) to the kubernetes machine.
That will both build fast - no zipping - and network transfer fast since rsync will skip everything but changed source files - which are loose in the packed layout (3rdparty dependencies are individual zips, but they are never re-zipped for the same 3rdparty dep version).
Pants uses this layout internally for those reasons.
s
Oh, thank you! I'll check this!
I find the building step must faster by this option, and the destination become a directory. Thank you!