I m following pex in a container <https pex readthedocs io e Pants #general

I'm following "pex in a container" <here>. But I'm...

bitter-ability-32190

07/12/2022, 7:23 PM

I'm following "pex in a container" here. But I'm seeing the stage for the 3rdparty unpacking not re-used when only changing 1stparty like suggested. I'm assuming this is because the PEX for the 3rdparty stage isn't re-used because it also contains the 1srtparty code. Am I understanding this correctly? Is there a way to have re-use (like mapping the docker build context's PEX, not copying)?

enough-analyst-54434

07/12/2022, 7:51 PM

It should be the case that if 3rdparty is not re-used, then unpacking the PEX with zip will reveal different contents in the root

.deps/

dir.

bitter-ability-32190

07/12/2022, 7:51 PM

I suppose another slice of this is the pants docker plugin

enough-analyst-54434

07/12/2022, 7:52 PM

So, that's the gut check. Assuming

.deps/

has changed, you have a Pants problem, if not, you have a Pex problem.

enough-analyst-54434

07/12/2022, 7:52 PM

Yeah, this could be in the Pants docker support. The zip comparison should help isolate this quickly.

enough-analyst-54434

07/12/2022, 7:57 PM

Is your

pex_binary

target depending on a

python_distribution

? That'd do it.

enough-analyst-54434

07/12/2022, 7:58 PM

That's the one designed in path for 1st party to show up as 3rdparty I'm aware of.

bitter-ability-32190

07/12/2022, 8:05 PM

No, don't think so. I'm leaning towards Pants on this one as the causer. I'll try looking into it further tomorrow, inspecting the zip is a great idea

bitter-ability-32190

07/12/2022, 8:43 PM

I think I might also be failing to understand docker's caching. If the input file (the

.pex

) has changed, how does it know to not try and re-create the image?

bitter-ability-32190

07/12/2022, 8:44 PM

Also, my test here is: • Build the image • run

docker images

• Add a comment to a first-party source • re-build image • re-run

docker images

I see two

<none>

images being build each time in addition to the tagged image

enough-analyst-54434

07/12/2022, 8:55 PM

Docker caches "layers"; basically the affect of each RUN or COPY against the prior fs layer.

enough-analyst-54434

07/12/2022, 8:56 PM

It skips through the layers until it hits a change.

bitter-ability-32190

07/12/2022, 8:56 PM

Wouldn't the result of the

COPY

then be uncached, because the PEX is changing?

enough-analyst-54434

07/12/2022, 8:58 PM

No. In the example the COPY use

as

which just copies that portion of the input PEX, not the whole thing.

enough-analyst-54434

07/12/2022, 8:58 PM

So the deps portion COPY layer should not change if dpes don't change.

enough-analyst-54434

07/12/2022, 8:59 PM

SO you should hit a skip for that layer.

enough-analyst-54434

07/12/2022, 8:59 PM

Then the src layer - which is very purposefuly ordered to COPY after the deps layer, does produce a new layer.

enough-analyst-54434

07/12/2022, 8:59 PM

So you rebuild, but only that 1 src layer at the end.

enough-analyst-54434

07/12/2022, 9:00 PM

Its the

as

--from

pair that gets you this fancyness. This trick did not always exist and is relatively new (maybe 5 years old, but not always around).

bitter-ability-32190

07/12/2022, 9:02 PM

How does the image layer with the

COPY

named

deps

not get invalidated with the new contents of

my-app.pex

bitter-ability-32190

07/12/2022, 9:02 PM

Even if only the firstparty changed, the overall file changed, right? Sorry, I think I'm missing something obvious

enough-analyst-54434

07/12/2022, 9:04 PM

Well it does, but everything after it short-circuits. So I guess you're saying COPY whole.pex is slow then - aka you have a huge pex?

enough-analyst-54434

07/12/2022, 9:05 PM

You should see docker saying you get a cache hit on that deps layer even if slow.

bitter-ability-32190

07/12/2022, 9:06 PM

Ah you're saying that

COPY --from=deps /my-app /my-app

is "fast" becuse thats fast, then?

bitter-ability-32190

07/12/2022, 9:06 PM

It still "builds" the images for

deps

and

srcs

only to produce the same

deps

image, so the work involved when doing the final thing is re-used.

bitter-ability-32190

07/12/2022, 9:07 PM

And it is ordered as such because

deps

changing is much less frequent than

srcs

changing

enough-analyst-54434

07/12/2022, 9:07 PM

I'm slightly lost. I'm only saying whatever https://pex.readthedocs.io/en/latest/recipes.html#pex-app-in-a-container says.

enough-analyst-54434

07/12/2022, 9:08 PM

Maybe sharpen the pain / reason for questions. Are the steps slower than you'd expect?

bitter-ability-32190

07/12/2022, 9:09 PM

Yeah I think I see the "idea" and understand it well enough now. Doing the steps above with a clean docker I see 6 images

bitter-ability-32190

07/12/2022, 9:09 PM

I would expect 5

enough-analyst-54434

07/12/2022, 9:10 PM

You should always get the same number of layers, the only difference will be which are cache hits for the purposes of whether the next layer needs to run or not.

enough-analyst-54434

07/12/2022, 9:10 PM

So, if you had deps, run1, run2, srcs as your layers, run1 and run2 would truly short circuit here.

enough-analyst-54434

07/12/2022, 9:11 PM

Since a RUN layer just takes the hash of the RUN command string.

enough-analyst-54434

07/12/2022, 9:11 PM

Unlike a COPY layer that takes the hash of the copied-in items.

enough-analyst-54434

07/12/2022, 9:12 PM

Maybe that sharpens it up?

bitter-ability-32190

07/12/2022, 9:12 PM

oh yes I see

bitter-ability-32190

07/12/2022, 9:12 PM

I need to move my

COPY

instructions higher up

bitter-ability-32190

07/12/2022, 9:12 PM

Im hopping off work, but I think thatll bear fruit

bitter-ability-32190

07/12/2022, 9:23 PM

I'm also seeing long build times, but I think thats comparitvely normal? My images are ~7GB. I'll have to see why

bitter-ability-32190

07/12/2022, 9:24 PM

I also wonder what it'd look like to export 2 PEXes. One with only deps nd one with only sources. Then do the multi-stage build using their respoective PEXs

enough-analyst-54434

07/12/2022, 9:24 PM

Pants could do that. Pex will not. That would mean generating invalid artifacts you have to know how to combine. I mean, Pex lets you use it that way of course! In two steps.

bitter-ability-32190

07/12/2022, 9:25 PM

oh yeah, first thought would be an in-repo pants plugin to test the waters

enough-analyst-54434

07/12/2022, 9:26 PM

Yeah, in the context of the Docker integration that would be very useful.

enough-analyst-54434

07/12/2022, 9:26 PM

Basically an internal speed hack for a well known case or else a documented hack Pants allows you to set up if you know what you're doing.

bitter-ability-32190

07/12/2022, 9:29 PM

I'd be ok with the latter for sure

bitter-ability-32190

07/12/2022, 9:37 PM

And to loop this up unzipping the 2 images for the deps has the exact same layer hashes.

bitter-ability-32190

07/12/2022, 9:38 PM

I think maybe our docker invocation should maybe set some setting so we dont see those images? I'll ping Andreas

bitter-ability-32190

07/13/2022, 12:09 AM

Ah epiphany. We might be able to offshore this work using Pants itself. `experimental_shell_command`'s output as a dep, where the command does the

--compile

bitter-ability-32190

07/13/2022, 12:37 AM

this is promising:

Copy code

experimental_shell_command(
    name="unpacked_deps",
    command=f"PEX_TOOLS=1 python3.8 {'/'.join(('..' for _ in build_file_dir().parts))}/{'.'.join(build_file_dir().parts)}/binary.pex venv --scope=deps --compile app-deps",
    dependencies=[":binary"],
    outputs=["app-deps/lib"],
    tools=["python3.8"],
)

bitter-ability-32190

07/13/2022, 1:00 AM

But honestly I dream of a solution where we don't build a pex just to immediately unpack it

enough-analyst-54434

07/13/2022, 1:57 AM

Yeah, that's a pretty tame dream. That should be straightforward to add to the docker integration I'd think. Alternatively, if Pants exposed the Pex

--layout {loose,packed,zipapp}

option in

pex_binary

you could always just configure loose and COPY

loose.pex/.deps

for the

deps

layer and the rest (

loose.pex/{.bootstrap,__main__.py,PEX-INFO,<1st party>}

) for the

src

layer.

bitter-ability-32190

07/13/2022, 10:22 AM

It does expose that! That's a great idea

bitter-ability-32190

07/13/2022, 2:15 PM

OK so this does speed up build, doesn't produce extraneous images, and works like a charm. How does this play with play with

venv

mode? Ideally the last image layer (I'm learning!) has the code prepped for immediate execution (while not having the files duplicated in the

COPY

destination and the final destination). I guess too, I don't care about what mode it runs in specifically just that it has near-native startup time (which

venv

mode promises)

bitter-ability-32190

07/13/2022, 2:22 PM

I suppose I can tweak my

experimental_shell_command

to use

--scope=all

and then use multiple

COPY

instructions for the relevant slices of the venv

enough-analyst-54434

07/13/2022, 2:23 PM

The

--layout

is orthogonal to the runtime execution mode; so if the

loose

PEX is

--venv

, when you run the loose PEX

__main__.py

it will bootstrap a venv under PEX_ROOT 1st run.

enough-analyst-54434

07/13/2022, 2:23 PM

Exactly! Do that instead.

enough-analyst-54434

07/13/2022, 2:24 PM

Actually, no - you'd want two scopes still I'd think so the logic of what to copy stays in PEX.

bitter-ability-32190

07/13/2022, 2:24 PM

Only problem is the glob-ability of

COPY

isn't exclude-friendly. Ideally I have

COPY path/to/lib/<not my 1stparty>

then

COPY path/to/lib/<1stparty>

enough-analyst-54434

07/13/2022, 2:24 PM

You can be dumber / more robust then.

bitter-ability-32190

07/13/2022, 2:24 PM

Ah right, two scopes would solve that

enough-analyst-54434

07/13/2022, 2:25 PM

So - do exactly like the pex docs reccomend, but ourside the container 1st, then use that to copy in the prebuilt slices of the venv.

✅ 1

bitter-ability-32190

07/13/2022, 2:25 PM

yup

bitter-ability-32190

07/13/2022, 2:25 PM

So then it's just a bummer we still build an "intermediate" PEX in the sandbox, but honestly not doing that is really weird because we are packaging that PEX

bitter-ability-32190

07/13/2022, 2:26 PM

These experimental shell commands are also a bit eyebrow-raising. But for now we're just experimenting. Making this cleaner is doable over time

bitter-ability-32190

07/13/2022, 2:52 PM

Ah Pants getting in the way 😕

Copy code

Error expanding output globs: Failed to read link "/tmp/process-executionhCEYrB/mcd/techlabs/projects/aidt/asr_service/app-srcs/bin/python3.8": Absolute symlink: "/usr/bin/python3.8"

I think @witty-crayon-22786 fixed this, but it isn't in 2.12.

bitter-ability-32190

07/13/2022, 2:53 PM

We're kind of cheating here I suppose. Making the venv on the dev box, and then just copying it in to the docker image and assuming everything is kosher. Multi-stage unpacks in the dest image of choice. I'm gonna need to stew on this one

enough-analyst-54434

07/13/2022, 2:55 PM

Yeah. you need to have a interpreter / OS from where you run the unpack that is compatible with where you use the unpack since what is unpacked is influence by the interpreter / OS being used to run the unpack.

enough-analyst-54434

07/13/2022, 2:56 PM

The beauty of the PEX unpack inside the container is this is not a worry.

bitter-ability-32190

07/13/2022, 2:58 PM

right

enough-analyst-54434

07/13/2022, 3:00 PM

podman (uses buildah) supports mounting in volumes when building an image unlke docker: https://github.com/containers/buildah/blob/main/docs/buildah-build.1.md

enough-analyst-54434

07/13/2022, 3:00 PM

That would allow no COPY and just RUN against the mounted in PEX - not sure if that works out faster in the end or not.

enough-analyst-54434

07/13/2022, 3:01 PM

I had a great experience with podman / buildah / crun in ~2019. Haven't touched it since.

enough-analyst-54434

07/13/2022, 3:02 PM

And I think your overlords are big backers of this project IIRC.

😂 1

enough-analyst-54434

07/13/2022, 3:03 PM

Ah no, Guiseppe is RedHat.

bitter-ability-32190

07/13/2022, 3:49 PM

OK so this test is weird, although only informational as the timing doesnt matter... When I added a newline to a firstparty file, rebuilt the PEX, and re-ran the docker build. The

COPY

from the

deps

stage still ran. Admittedly the

COPY

took 4 seconds, so honestly it doesn't make a difference timing-wise. But it is a datapoint

enough-analyst-54434

07/13/2022, 3:52 PM

Um, I'm not sure of Docker logic, but it could be imagined that it took the fingerprint of the "context" (+ the fingerprint of the COPY instruction text) as the cache key for COPY instructions. If so, the Docker context includes both the unchanged extraed deps and the changed extracted srcs and so, in sum, it has changed.

enough-analyst-54434

07/13/2022, 3:52 PM

Where the Docker "context" - docker's terminology, is the dir tree rooted at the Dockerfile

enough-analyst-54434

07/13/2022, 3:53 PM

So, that would totally explain the behavior.

enough-analyst-54434

07/13/2022, 3:53 PM

This all makes more sense if you try to imagine how you would implement (caching layers in) a docker build yourself.

bitter-ability-32190

07/13/2022, 3:56 PM

Secondly I noticed the compilation of

--scope=deps

took ~40 seconds with

layout=zipapp

, but took ~150 seconds with

layout=loose

bitter-ability-32190

07/13/2022, 4:00 PM

layout=packed

seems to match

layout=zipapp

, I think if I only copy the PEX-specific bits and use

layout=packed

we might be on to something

bitter-ability-32190

07/13/2022, 4:09 PM

Yeah this gets the cache turned to 11 (using a packed PEX)

Copy code

FROM ... as deps
COPY my/binary.pex/__main__.py /bin/app.pex/__main__.py
COPY my/binary.pex/.bootstrap /bin/app.pex/.bootstrap
COPY my/binary.pex/PEX-INFO /bin/app.pex/PEX-INFO
COPY my/binary.pex/.deps /bin/app.pex/.deps
RUN PEX_TOOLS=1 python3.8 /bin/app.pex venv --scope=deps --compile /bin/app

plus

Copy code

FROM ... as srcs
COPY my/binary.pex /bin/app.pex
RUN PEX_TOOLS=1 python3.8 /bin/app.pex venv --scope=srcs --compile /bin/app

if I edit a firstparty file inside the PEX and re-run it completely skips the

deps

stage

bitter-ability-32190

07/13/2022, 4:10 PM

So now I just need to verify none of

.deps

or the other

deps

stage inputs change by editing a 1stparty file with meaningless (like a comment) changes

bitter-ability-32190

07/13/2022, 4:11 PM

Shucks... I see the

deps

stage running. Oddly it doesnt run the

COPY

but does run the

RUN

🤔

enough-analyst-54434

07/13/2022, 4:13 PM

I'm going to bow out if the live debug session, I think you know the relevant bits at play to dig on this further.

bitter-ability-32190

07/13/2022, 4:14 PM

"code_hash": "8c7b5100b86874cd4d41b5994e7ca8061ecf5403",

RIP

bitter-ability-32190

07/13/2022, 4:30 PM

So because the PEX info file contains the code hash, we'll never get a true cache for deps stage

enough-analyst-54434

07/13/2022, 4:36 PM

Then delete PEX-INFO after extracting srcs and deps? I'm a bit lost where you're at, but the venv tool code leaves out PEX-INFO in the venv it creates with

--scope deps

for this reason. It only adds it when you complete things in

--scope srcs

bitter-ability-32190

07/13/2022, 4:37 PM

I'm trying to skip the deps stage altogether when only 1stparty changes. But it can't skip when inputs change and in those case PEX_INFO is an input

enough-analyst-54434

07/13/2022, 4:38 PM

Ok, pull me in if you need help once you're settled on the hacking around explorations.

bitter-ability-32190

07/13/2022, 4:39 PM

Does PEX need the code hash? I might try deleting it and seeing what goes boom

enough-analyst-54434

07/13/2022, 4:51 PM

It doesn't for venv execution mode no matter the layout. It does otherwise in packed and zip layouts to share code compilation in

~/pex/user_code/<hash>/

bitter-ability-32190

07/13/2022, 5:04 PM

AssertionError: Expected code_hash to be populated for Spread PEX directory /bin/app.pex.

FOr compiling deps 😞

enough-analyst-54434

07/13/2022, 5:12 PM

Ok. Well, like I said, once you've settled hacks and have a coherent view of what you'd like to do, pull me in if needed. I have a hard time real-time tracking your debugging / coding session but can probably be pretty helpful if you present something a bit more async with time to absorb the goal, the attempted paths, etc.

bitter-ability-32190

07/13/2022, 5:14 PM

I think I'm all set minus

PEX-INFO

not staying the same after just touching 1stparty

bitter-ability-32190

07/13/2022, 5:17 PM

All of this is very involved, in-general. I think I'll boil it down to an in-repo plugin. In which case I can muck with

PEX_INFO

myself. SO maybe no request, just highlighting a potential improvement

3 Views

Open in Slack

Previous Next