Hi. I'm trying to build two docker images with pan...
# general
f
Hi. I'm trying to build two docker images with pants, where one depends on the other. I have a simplified version of the BUILD file that reproduces the issue I have:
Copy code
docker_image(
    name = "01-base",
    instructions = [
        "FROM python:3.11.8-slim-bookworm"
    ]
)

docker_image(
    name = "02-app",
    dependencies = [
        ":01-base"
    ],
    instructions = [
        "FROM 01-base:latest",
        "RUN echo test"
    ]
)
When I delete the
01-base
image and rerun the
pants package
for
02-app
, the build fails because there is no
01-base
image. This happens even though both images are in fact built.
Copy code
15:02:04.51 [INFO] Completed: Building docker image 01-base:latest
15:02:04.91 [INFO] Completed: Building docker image 02-app:latest
If I just rerun
pants package
again, the build succeeds. It seems to me that
pants
doesn't wait for the image to be actually available or something. Is this a known issue? Does my approach even make sense or is what I'm doing not "the pants way" of thinking?
h
Hmm, I don’t reproduce. If I delete
01-base
(and the dep to it) and rebuild then the build works because
01-base
exists in my local docker images. But if I
docker image rm
it and then rerun then things fail, as expected, because the base image isn’t found by docker.
What error do you see when the build fails?
Oh wait, I think I am misunderstanding. When you say “When I delete the
01-base
image” do you mean you
docker image rm
it from your local docker state, or do you mean that you delete its target from your BUILD file?
☝️ 1
I guess you mean the first thing, because now I do reproduce that behavior.
I would call this a bug
Docker is a tricky case for Pants, because it involves persistent state (the local image registry) that lives outside of Pants’s control
@curved-television-6568 is this known behavior?
f
Exactly, I meant removing the image via
docker image rm
.
h
Does the local registry make newly built images available for pull asynchronously, I wonder, so there’s a race condition
🤷 1
Ah no, sorry, this is more obvious than that
For example, if you run Pants with
--no-pantsd
you won’t reproduce this
this is because Pants has cached in memory that the first image was produced , and nothing it its state contradicts that fact
Sorry, this should have been obvious to me, slow morning
So Pants is short-circuiting all the rules that produce the base image, because they have already run successfully
But then that failed run does repopulate the local registry, I need to see how that happens
I need to take a look inside the docker backend code
f
I can confirm that it works with
--no-pantsd
. Thanks! And yes, it seems that pants repopulates the registry after the error. I would expect that if it doesn't check the cache, it would also fail on all further attempts.
h
Interesting, it looks like it does run both processes every time, but it runs them sequentially in dependency order when it works and concurrently when it doesn’t (which explains why it fails that time but the image is available next time)
I need to dive into the code to see why this is
Unless it is obvious to @curved-television-6568
c
My guess you already said what’s going on. Pants doesn’t know the image was removed, and serves up the previous result from cache. This is quick so may look concurrent. But this should be true also for the second image.. so we’re missing a clue to the puzzle. It may be pants does kick off the process before having the cache result (hit or miss) come back and first then kill the process in case of a hit. This to not slow down processes to needlessly wait for cache misses (only wasting a few cpu cycles in case of a cache hit.) So, my hunch is that docker is fast enough to fail the process for the second image before being killed by the cache hit. Maybe. 🤔
h
I think the mystery is why it does rerun the process to rebuild the base image, but concurrently
If you run with
-ldebug
you see the process actually running
Oh right, dammit, speculation…
You’re right, this is speculation “saving us” the second time around
👍 1
Uuuugh
So probably the robust fix is for Pants to have its own internal local Docker repo, instead of using the default one
Does that make sense?
That way an outside force can’t mess with the internal state
c
That could get problematic quick given the lack of cache mgmt features if you have many large images.. I think another option could be to have a pre-build step in the docker backend that checks if a given image already exists or not.
h
How would it know if it is the right image though?
f
Not sure if this changes anything, but I also tried deleting both the
01-base
and
02-app
images + deleting pants cache. When I try running
pants package
on
02-app
, I get the same error, about
01-base
not existing. Even though it was built. So long story short, the same experiment, just with an empty cache.
But it still works with the
--no-pantsd
flag.
h
Yes, there are two caches in play here, the on-disk cache and the in-memory cache in pantsd.
So this is consistent with what you’re seeing. The in-memory cache is the spoilsport here.
c
How would it know if it is the right image though?
By adding a "pants" label to the built image with the cache key.