Before I go mad and spend ages in the weeds does anyone have Pants #general

Before I go mad, and spend ages in the weeds, does...

average-breakfast-91545

05/01/2024, 7:27 AM

Before I go mad, and spend ages in the weeds, does anyone have an explanation for a weird behaviour I've seen: We use Amazon ECR as a docker registry, and for production builds, we set the tags to be immutable, ie. once you push the tag

foo

, further pushes to

foo

are rejected. I switched our pipeline to use {pants.hash} as the tag for our images. The first build succeeded, but a second build failed, because it tried to push to a tag that already existed. We're using multi-stage images, to build a docker image suited to lambda, which we do by just dumping the contents of the src and deps pexes into /var/runtime

Copy code

docker_image(
    name="ingest",
    tags=["artifact"],
    image_tags=["{pants.hash}"],
    source=None,
    instructions=[
        "FROM .<http://dkr.ecr.eu-west-2.amazonaws.com/python-lambda-base|dkr.ecr.eu-west-2.amazonaws.com/python-lambda-base>",
        "COPY src.ingest/ingest_deps.pex /ingest.pex",
        "RUN PEX_TOOLS=1 /var/lang/bin/python3.9 /ingest.pex venv --scope=deps --compile --collisions-ok  --rm pex /tmp/venv-deps",

        "FROM .<http://dkr.ecr.eu-west-2.amazonaws.com/python-lambda-base|dkr.ecr.eu-west-2.amazonaws.com/python-lambda-base>",
        "COPY src.ingest/ingest_src.pex /ingest.pex",
        "RUN PEX_TOOLS=1 /var/lang/bin/python3.9 /ingest.pex venv --scope=srcs --compile --collisions-ok  --rm pex /tmp/venv-srcs",

        "FROM .<http://dkr.ecr.eu-west-2.amazonaws.com/python-lambda-base|dkr.ecr.eu-west-2.amazonaws.com/python-lambda-base>",
        "COPY --from=0 /tmp/venv-deps/lib/python3.9/site-packages/ /var/runtime",
        "COPY --from=1 /tmp/venv-srcs/lib/python3.9/site-packages/ /var/runtime",
    ],
)

When pushing the image, it seems that we're always pushing the last two layers, even if the same pexes, read from cache, are used as inputs. The pants.hash matches, so I'm guessing that the inputs are exactly the same. Why is Docker deciding that these are new, different, layers that need pushing?

broad-processor-92400

05/01/2024, 7:43 AM

Several possibilities/questions come to mind: • timestamps of files in the layer may differ https://docs.docker.com/build/ci/github-actions/reproducible-builds/ • some non-determinism in

pex venv

• just confirming outside pants , does it work to do a second push of something that’s already been pushed? (ie does ECR accept a second push if it is exactly identical or does block all second pushes, without checking content)

average-breakfast-91545

05/01/2024, 7:45 AM

It blocks all pushes, I believe, though I'll check. Thanks for the link, I'll give that a go and see what happens. I was bamboozled because the docker sha documentation suggests that it ignores file modification times.

nutritious-hair-72580

05/02/2024, 1:20 AM

I think there's been some discussion on slack in the past, but currently the docker backend always pushes, even if there are no changes. Ah, found it.

nutritious-hair-72580

05/02/2024, 1:21 AM

I think you would need to add something else to the tag, like a timestamp.

average-breakfast-91545

05/02/2024, 6:54 AM

The issue isn't that the command attempts to push. It attempts to push all the layers of the image, most of which are unchanged, so I get the output

Layer already exists.

The issue is that for the last couple of layers, the layer does not exist, even though (in theory) the image is exactly the same as the previous run. I'll give the source epoch a go and see if that makes any difference.

2 Views

Open in Slack

Previous Next