I'm seeing very high startup times on a certain do...
# general
b
I'm seeing very high startup times on a certain docker image, which happens to be running a PEX binary thats ~1.5GB in size and contains at least 1 GB worth of a dir containing 100k files. Any tips on making this less more startup-friendly? šŸ™‚
šŸ˜² 2
layout="loose"
was not the ticket
h
Is this venv mode?
b
I might run the app in my dockerfile so it's uploaded unpacked
Yes venv mode
w
running as a true zipapp would avoid actually extracting the contents to disk first
h
Yeah, so the one time creation of the venv is supposed to amortize over multiple invocations
w
but i donā€™t think that that mode exists anymore.
h
but in the docker world you often only have the one invocation per container...
I think it does?
If you don't set venv or loose?
w
@happy-kitchen-89482: https://www.pantsbuild.org/docs/reference-pex_binary#codeexecution_modecode ā€œmodified zipappā€
pyoxidizer gets you the moral equivalent of a zipapp
c
we had a problem with a dir with fewer files than that causing problems with basic file operations (like
ls
). We solved it by sharding it into subfolders based on filename hash (I think the last 2 hexdigits, so "00" through "FF"). Though it sounds like solutions like that are off the table
w
filesystems suck =(
b
As the build guy, application changes like that are hard to request šŸ˜­
Also Stu we use dunder file all over, so true zipapp is a tiny nonstarter šŸ˜­
w
you might be able to fiddle with those files such that they are appended to the pythonpath as a zipfile
c
Oof, I figured. You might also be able to not include the files in the PEX, and have them included in a separate stage of the docker build. That way the wild fileio happens before the pex unzip?
w
@bitter-ability-32190: butā€¦ is this a case where you actually need these on the pythonpath?
if not, you could put them next to the PEX itself in whichever format, and then
open
them
b
Aren't I unzipping them either way though? We didn't solve the original issue?
I'm thinking of just adding a stage to the build where I call my app with --help šŸ™ƒ
w
if this is about incurring a one time cost so that future startups are fast, then yea: thatā€™s very reasonable
b
Is there a magical command that does the spiritual equivalent to running the PEX such that it is just unpacked?
If only I could test this outside of docker by running the built PEX in one command šŸ˜‚šŸ˜‰šŸ¤”
w
if you build with embedded pex-tools, you can explicitly ask a PEX to extract to a venv
but i donā€™t know about ā€œseed your usual cached venv and then exitā€ flag
e
@witty-crayon-22786 that mode ~never did what you thought it did. For Pex 2.x "true" zipapp just ran user code from zip. All 3rd party deps we're always extracted as the easy hedge against platform specific wheels with embedded `.so`s. Those must be unzipped to load at all. This goes all the way back to the intro of wheel support in Pex.
b
Yeah that reminds me we also have first party wheels with embedded .so s
I'll try your suggestion first, and decouple the big dir if that doesn't work
I think I'm going to do both. Globally use your link as a general docker container recipe for our pex binaries. And then also make this binary exclude the dep, and then in another stage for this container copy and unzip the fat dir
From running Docker in the sandbox: ā€¢ Transferring the PEX to the daemon is speedy ā€¢ I think
PEX_TOOLS=1 python3.8 /bin/app.pex venv --scope=deps --compile /app
calls are the slow ones ā—¦ I don't have much imagination on debugging this, but I suspect bundling a PEX with >100k firstparty sources is slow to unpack? ā€¢ Building the PEX without the files, and having docker place them in the image seems "fast" by comparison
e
@bitter-ability-32190 to your second point, You pinpoint
--scope=deps
but mention >100k 1st party sources. 1st party != deps so they should be unrelated. I guess I'm not tracking your thought process combining those two elements there.
FWIW, most interesting Pex timings are revealed with PEX_VERBOSE=1 (or 2 - higher gets noisy).
āœ… 1
As far as packing and unpacking go, it's the packing with default --compress that can be slow. I added --no-compress a while back but Pants does not expose a knob. That can make packing considerably faster at the cost of size.
b
Ah I copied the first line, that should be scope=srcs
e
Ok. What do you mean by slow then? My experiment:
Copy code
# Create 100k:
$ mkdir big
$ man ls > big/seed
$ for i in `seq 1 100000`; do cp big/seed big/$i; done
$ du -sh big/
782M	big/



# Native zip store / unpack 100k no compression:
$ time zip -qr --compression-method store big.stored.zip big/

real	0m2.187s
user	0m1.552s
sys	0m0.624s
$ time unzip -q -d stored big.stored.zip 

real	0m3.427s
user	0m2.661s
sys	0m0.673s



# Native zip store / unpack 100k default compression:
$ time zip -qr big.default.zip big/

real	0m12.674s
user	0m12.083s
sys	0m0.576s
 time unzip -q -d default big.default.zip 

real	0m3.757s
user	0m3.167s
sys	0m0.547s



# Create PEX / unpack PEX default compression 100k 1st party:
$ time pex -D big -o big.default.pex

real	0m18.790s
user	0m17.191s
sys	0m1.545s
$ time pex-tools big.default.pex venv /tmp/big.default.pex.venv

real	0m14.282s
user	0m8.930s
sys	0m5.193s



# Create PEX / unpack PEX no compression 100k 1st party:
$ time pex -D big --no-compress -o big.stored.pex

real	0m6.879s
user	0m5.129s
sys	0m1.713s
$ time pex-tools big.stored.pex venv /tmp/big.stored.pex.venv

real	0m5.205s
user	0m3.622s
sys	0m1.539s
I'm sure my "big" is not matching your case, but, for starters of something we can share.
b
Startup first time was in the hundreds of seconds. When we shifted that to using the "unpack first" I saw similar.
I can spend some time tomorrow timing. FWIW We fixed it by using that multi-stage build you linked (šŸŽ‰ ) and added a new stage which copied the sources. That way when the code changes, it doesn't need to recopy. Overall a win.
e
Ok, stepping back: If you pre-create the venv with --compile, then you're left with a native, precompiled venv. That should have the same startup time as a native venv. Are you debating that or are you just concerned with the time taken to pre-seed the venv in the Dockerfile build step?
b
Not debating. Startup time when not seeded. pre-seed time when seeded.
e
Perhaps file some issues with more context tomorrow if I need to do anything. I'm still a bit lost if you're sorted or left wanting.
b
definitely sorted. Multi-stage build was the hot ticket for all our images using pex and an additional stage made this speedy
e
OK, great.