rough-room-65027
06/13/2024, 5:56 AMcurved-television-6568
06/13/2024, 8:16 AMpex --help
May provide a little more details than the pants docs?
--layout {zipapp,packed,loose}
By default, a PEX is created as a single file zipapp whenis specified, but either a packed or loose directory tree based layout can be chosen instead. A packed layout PEX is an executable-o
directory structure designed to have cache-friendly characteristics for syncing incremental updates to PEXed applications over a network. At the top level of the packed directory tree there is an
executable `__main__.py`script. The directory can also be executed by passing its path to a Python executable; e.g:. The Pex bootstrap code and all dependency code are packedpython packed-pex-dir/
into individual zip files for efficient caching and syncing. A loose layout PEX is similar to a packed PEX, except that neither the Pex bootstrap code nor the dependency code are packed into zip files,
but are instead present as collections of loose files in the directory tree providing different caching and syncing tradeoffs. Both zipapp and packed layouts install themselves in the PEX_ROOT as loose
apps by default before executing, but these layouts compose withexecution mode as well and support `--seed`ing. (default: zipapp)--venv
happy-kitchen-89482
06/13/2024, 1:48 PM--packed
and --venv
in practice, especially if deploying in a docker image. Although zipapp
is useful if you want a single .pex
file for ease of deployment of the raw file.broad-processor-92400
06/14/2024, 7:37 AMzipapp
layout is a single large zip file: any change to its contents (even adding a single .
to a comment in a single source file) will result in a pex with (slightly) different contents, and the whole pex will have to be created from scratch, zipping up all its contents.
⢠As those docs describe, the packed
/ loose
layouts use more directories, so sub-parts of the pex are stored separately
⢠This can be particularly noticeable when a pex uses large dependencies (e.g. numpy, opencv, pytorch, tensorflow):
⦠for `zipapp`: even a tiny change to an input source file will store a new many-megabyte pex in the cache, plus pex will spend more time manipulating the dependencies to put them into that zip
⦠for `packed`: a tiny change to a source file won't change the dependencies, and thus the .whl
s in .deps/
won't need to be recached (i.e. those files will be deduplicated within the cache), plus pex can just copy them around rather than needing to synthesize a whole new zip
⦠loose
is similar to packed
, although I think can be slower, since it has to unzip all the dependencies and write their contents to diskrough-room-65027
06/14/2024, 8:41 AMlayout
šwide-midnight-78598
06/15/2024, 12:28 AMloose
- because in the docs, and in practice, it's a bit harder to grok. I recall needing to run a tree
on the unzipped pex and diffing to try to get a grasp of what was going on. Benjy's comment is basically where I landed too šwide-midnight-78598
06/15/2024, 12:30 AMpex_binary(
name="bin",
dependencies=[":lib"],
entry_point="main.py",
execution_mode=parametrize("venv", "zipapp"),
layout=parametrize("loose", "packed", "zipapp"),
)
wide-midnight-78598
06/15/2024, 12:57 AMloose
doesn't seem to be cached as I would expect:
No modifications, loose
is still packaging for 1 second (when the other variants are 0.1s)
scratch/pants-large % time pants package simple:bin@execution_mode=venv,layout=loose
20:55:12.90 [INFO] Wrote dist/simple/bin@execution_mode=venv,layout=loose.pex
pants package simple:bin@execution_mode=venv,layout=loose 0.01s user 0.01s system 1% cpu 1.043 total
scratch/pants-large % time pants package simple:bin@execution_mode=venv,layout=zipapp
20:55:14.35 [INFO] Wrote dist/simple/bin@execution_mode=venv,layout=zipapp.pex
pants package simple:bin@execution_mode=venv,layout=zipapp 0.01s user 0.01s system 1% cpu 0.103 total
scratch/pants-large % time pants package simple:bin@execution_mode=venv,layout=packed
20:55:18.33 [INFO] Wrote dist/simple/bin@execution_mode=venv,layout=packed.pex
pants package simple:bin@execution_mode=venv,layout=packed 0.01s user 0.01s system 19% cpu 0.109 total
broad-processor-92400
06/15/2024, 1:01 AMdist/
and write the new onewide-midnight-78598
06/15/2024, 1:05 AMwide-midnight-78598
06/15/2024, 1:06 AMpackage
broad-processor-92400
06/15/2024, 1:22 AMpants package
is finalising its work by writing the output digest(s) into dist
, it has to:
⦠first, delete anything that's already there (i.e. potentially 7k files if overwriting an existing package).
⦠then, write each file in the digest to disk
⢠You'd potentially see similar behaviour with any set of 7k files (e.g. a shell command that generates many files + pants export-codegen
), even if the output wasn't conceptually connected to a zip
⢠NB. I don't know the specifics so I could be wrong, but you could get a sense of how much the file manipulation costs with commands like time rm -rf dist/...
or time cp -R dist/... /tmp/whatever
.wide-midnight-78598
06/15/2024, 1:24 AMwide-midnight-78598
06/15/2024, 1:24 AMwide-midnight-78598
06/15/2024, 1:25 AMscie
s a lot, and I set mine up to unpack on first run - so I'm used to deferring that cost from my pants
time. Gonna have to remember this... Thanks!