https://pantsbuild.org/ logo
b

bitter-ability-32190

06/29/2022, 9:18 PM
I'm seeing very high startup times on a certain docker image, which happens to be running a PEX binary thats ~1.5GB in size and contains at least 1 GB worth of a dir containing 100k files. Any tips on making this less more startup-friendly? 🙂
😲 2
layout="loose"
was not the ticket
h

happy-kitchen-89482

06/29/2022, 9:30 PM
Is this venv mode?
b

bitter-ability-32190

06/29/2022, 9:30 PM
I might run the app in my dockerfile so it's uploaded unpacked
Yes venv mode
w

witty-crayon-22786

06/29/2022, 9:30 PM
running as a true zipapp would avoid actually extracting the contents to disk first
h

happy-kitchen-89482

06/29/2022, 9:30 PM
Yeah, so the one time creation of the venv is supposed to amortize over multiple invocations
w

witty-crayon-22786

06/29/2022, 9:30 PM
but i don’t think that that mode exists anymore.
h

happy-kitchen-89482

06/29/2022, 9:31 PM
but in the docker world you often only have the one invocation per container...
I think it does?
If you don't set venv or loose?
w

witty-crayon-22786

06/29/2022, 9:31 PM
pyoxidizer gets you the moral equivalent of a zipapp
c

careful-address-89803

06/29/2022, 9:32 PM
we had a problem with a dir with fewer files than that causing problems with basic file operations (like
ls
). We solved it by sharding it into subfolders based on filename hash (I think the last 2 hexdigits, so "00" through "FF"). Though it sounds like solutions like that are off the table
w

witty-crayon-22786

06/29/2022, 9:32 PM
filesystems suck =(
b

bitter-ability-32190

06/29/2022, 9:33 PM
As the build guy, application changes like that are hard to request 😭
Also Stu we use dunder file all over, so true zipapp is a tiny nonstarter 😭
w

witty-crayon-22786

06/29/2022, 9:35 PM
you might be able to fiddle with those files such that they are appended to the pythonpath as a zipfile
c

careful-address-89803

06/29/2022, 9:36 PM
Oof, I figured. You might also be able to not include the files in the PEX, and have them included in a separate stage of the docker build. That way the wild fileio happens before the pex unzip?
w

witty-crayon-22786

06/29/2022, 9:36 PM
@bitter-ability-32190: but… is this a case where you actually need these on the pythonpath?
if not, you could put them next to the PEX itself in whichever format, and then
open
them
b

bitter-ability-32190

06/29/2022, 9:37 PM
Aren't I unzipping them either way though? We didn't solve the original issue?
I'm thinking of just adding a stage to the build where I call my app with --help 🙃
w

witty-crayon-22786

06/29/2022, 9:39 PM
if this is about incurring a one time cost so that future startups are fast, then yea: that’s very reasonable
b

bitter-ability-32190

06/29/2022, 9:40 PM
Is there a magical command that does the spiritual equivalent to running the PEX such that it is just unpacked?
If only I could test this outside of docker by running the built PEX in one command 😂😉🤔
w

witty-crayon-22786

06/29/2022, 9:58 PM
if you build with embedded pex-tools, you can explicitly ask a PEX to extract to a venv
but i don’t know about “seed your usual cached venv and then exit” flag
e

enough-analyst-54434

06/30/2022, 3:04 AM
@witty-crayon-22786 that mode ~never did what you thought it did. For Pex 2.x "true" zipapp just ran user code from zip. All 3rd party deps we're always extracted as the easy hedge against platform specific wheels with embedded `.so`s. Those must be unzipped to load at all. This goes all the way back to the intro of wheel support in Pex.
b

bitter-ability-32190

06/30/2022, 5:38 AM
Yeah that reminds me we also have first party wheels with embedded .so s
I'll try your suggestion first, and decouple the big dir if that doesn't work
I think I'm going to do both. Globally use your link as a general docker container recipe for our pex binaries. And then also make this binary exclude the dep, and then in another stage for this container copy and unzip the fat dir
From running Docker in the sandbox: • Transferring the PEX to the daemon is speedy • I think
PEX_TOOLS=1 python3.8 /bin/app.pex venv --scope=deps --compile /app
calls are the slow ones ◦ I don't have much imagination on debugging this, but I suspect bundling a PEX with >100k firstparty sources is slow to unpack? • Building the PEX without the files, and having docker place them in the image seems "fast" by comparison
e

enough-analyst-54434

07/01/2022, 12:26 AM
@bitter-ability-32190 to your second point, You pinpoint
--scope=deps
but mention >100k 1st party sources. 1st party != deps so they should be unrelated. I guess I'm not tracking your thought process combining those two elements there.
FWIW, most interesting Pex timings are revealed with PEX_VERBOSE=1 (or 2 - higher gets noisy).
1
As far as packing and unpacking go, it's the packing with default --compress that can be slow. I added --no-compress a while back but Pants does not expose a knob. That can make packing considerably faster at the cost of size.
b

bitter-ability-32190

07/01/2022, 12:29 AM
Ah I copied the first line, that should be scope=srcs
e

enough-analyst-54434

07/01/2022, 12:36 AM
Ok. What do you mean by slow then? My experiment:
Copy code
# Create 100k:
$ mkdir big
$ man ls > big/seed
$ for i in `seq 1 100000`; do cp big/seed big/$i; done
$ du -sh big/
782M	big/



# Native zip store / unpack 100k no compression:
$ time zip -qr --compression-method store big.stored.zip big/

real	0m2.187s
user	0m1.552s
sys	0m0.624s
$ time unzip -q -d stored big.stored.zip 

real	0m3.427s
user	0m2.661s
sys	0m0.673s



# Native zip store / unpack 100k default compression:
$ time zip -qr big.default.zip big/

real	0m12.674s
user	0m12.083s
sys	0m0.576s
 time unzip -q -d default big.default.zip 

real	0m3.757s
user	0m3.167s
sys	0m0.547s



# Create PEX / unpack PEX default compression 100k 1st party:
$ time pex -D big -o big.default.pex

real	0m18.790s
user	0m17.191s
sys	0m1.545s
$ time pex-tools big.default.pex venv /tmp/big.default.pex.venv

real	0m14.282s
user	0m8.930s
sys	0m5.193s



# Create PEX / unpack PEX no compression 100k 1st party:
$ time pex -D big --no-compress -o big.stored.pex

real	0m6.879s
user	0m5.129s
sys	0m1.713s
$ time pex-tools big.stored.pex venv /tmp/big.stored.pex.venv

real	0m5.205s
user	0m3.622s
sys	0m1.539s
I'm sure my "big" is not matching your case, but, for starters of something we can share.
b

bitter-ability-32190

07/01/2022, 12:38 AM
Startup first time was in the hundreds of seconds. When we shifted that to using the "unpack first" I saw similar.
I can spend some time tomorrow timing. FWIW We fixed it by using that multi-stage build you linked (🎉 ) and added a new stage which copied the sources. That way when the code changes, it doesn't need to recopy. Overall a win.
e

enough-analyst-54434

07/01/2022, 12:40 AM
Ok, stepping back: If you pre-create the venv with --compile, then you're left with a native, precompiled venv. That should have the same startup time as a native venv. Are you debating that or are you just concerned with the time taken to pre-seed the venv in the Dockerfile build step?
b

bitter-ability-32190

07/01/2022, 12:41 AM
Not debating. Startup time when not seeded. pre-seed time when seeded.
e

enough-analyst-54434

07/01/2022, 12:41 AM
Perhaps file some issues with more context tomorrow if I need to do anything. I'm still a bit lost if you're sorted or left wanting.
b

bitter-ability-32190

07/01/2022, 12:42 AM
definitely sorted. Multi-stage build was the hot ticket for all our images using pex and an additional stage made this speedy
e

enough-analyst-54434

07/01/2022, 12:42 AM
OK, great.
3 Views