Before I go mad, and spend ages in the weeds, does...
# general
a
Before I go mad, and spend ages in the weeds, does anyone have an explanation for a weird behaviour I've seen: We use Amazon ECR as a docker registry, and for production builds, we set the tags to be immutable, ie. once you push the tag
foo
, further pushes to
foo
are rejected. I switched our pipeline to use {pants.hash} as the tag for our images. The first build succeeded, but a second build failed, because it tried to push to a tag that already existed. We're using multi-stage images, to build a docker image suited to lambda, which we do by just dumping the contents of the src and deps pexes into /var/runtime
Copy code
docker_image(
    name="ingest",
    tags=["artifact"],
    image_tags=["{pants.hash}"],
    source=None,
    instructions=[
        "FROM .<http://dkr.ecr.eu-west-2.amazonaws.com/python-lambda-base|dkr.ecr.eu-west-2.amazonaws.com/python-lambda-base>",
        "COPY src.ingest/ingest_deps.pex /ingest.pex",
        "RUN PEX_TOOLS=1 /var/lang/bin/python3.9 /ingest.pex venv --scope=deps --compile --collisions-ok  --rm pex /tmp/venv-deps",

        "FROM .<http://dkr.ecr.eu-west-2.amazonaws.com/python-lambda-base|dkr.ecr.eu-west-2.amazonaws.com/python-lambda-base>",
        "COPY src.ingest/ingest_src.pex /ingest.pex",
        "RUN PEX_TOOLS=1 /var/lang/bin/python3.9 /ingest.pex venv --scope=srcs --compile --collisions-ok  --rm pex /tmp/venv-srcs",

        "FROM .<http://dkr.ecr.eu-west-2.amazonaws.com/python-lambda-base|dkr.ecr.eu-west-2.amazonaws.com/python-lambda-base>",
        "COPY --from=0 /tmp/venv-deps/lib/python3.9/site-packages/ /var/runtime",
        "COPY --from=1 /tmp/venv-srcs/lib/python3.9/site-packages/ /var/runtime",
    ],
)
When pushing the image, it seems that we're always pushing the last two layers, even if the same pexes, read from cache, are used as inputs. The pants.hash matches, so I'm guessing that the inputs are exactly the same. Why is Docker deciding that these are new, different, layers that need pushing?
b
Several possibilities/questions come to mind: • timestamps of files in the layer may differ https://docs.docker.com/build/ci/github-actions/reproducible-builds/ • some non-determinism in
pex venv
• just confirming outside pants , does it work to do a second push of something that’s already been pushed? (ie does ECR accept a second push if it is exactly identical or does block all second pushes, without checking content)
a
It blocks all pushes, I believe, though I'll check. Thanks for the link, I'll give that a go and see what happens. I was bamboozled because the docker sha documentation suggests that it ignores file modification times.
n
I think there's been some discussion on slack in the past, but currently the docker backend always pushes, even if there are no changes. Ah, found it.
I think you would need to add something else to the tag, like a timestamp.
a
The issue isn't that the command attempts to push. It attempts to push all the layers of the image, most of which are unchanged, so I get the output
Layer already exists.
The issue is that for the last couple of layers, the layer does not exist, even though (in theory) the image is exactly the same as the previous run. I'll give the source epoch a go and see if that makes any difference.