Has anyone else had issues with `./pants publish` ...
# general
n
Has anyone else had issues with
./pants publish
being rather flaky when building docker images? --> 🧵
🙅‍♂️ 2
docker_image()
in our build files should be rather standard, the only exotic thing is that we use
image_tags=["{build_args.VERSION}"]
. To build and publish in our CI we run:
Copy code
./pants publish --changed-since="${CI_COMMIT_BEFORE_SHA}" --changed-dependees=transitive --docker-build-args=VERSION
This is the output whenever this command fails (which is easily more than half of the times):
Copy code
21:15:33.81 [INFO] Starting: Building dockerfile_parser.pex from dockerfile-parser_default.lock
21:15:33.90 [INFO] Canceled: Building dockerfile_parser.pex from dockerfile-parser_default.lock
21:15:33.90 [INFO] Starting: Building dockerfile_parser.pex from dockerfile-parser_default.lock
21:15:33.95 [INFO] Canceled: Building dockerfile_parser.pex from dockerfile-parser_default.lock
21:15:34.34 [INFO] Starting: Building docker image <http://registry.example.com/path/to/foobar:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58|registry.example.com/path/to/foobar:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58>
21:15:34.34 [INFO] Starting: Building docker image <http://registry.example.com/path/to/bazqux:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58|registry.example.com/path/to/bazqux:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58>
21:15:34.34 [INFO] Starting: Building docker image <http://registry.example.com/path/to/example:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58|registry.example.com/path/to/example:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58>
21:17:04.30 [INFO] Long running tasks:
  89.96s	Building docker image <http://registry.example.com/path/to/example:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58|registry.example.com/path/to/example:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58>
  89.96s	Building docker image <http://registry.example.com/path/to/foobar:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58|registry.example.com/path/to/foobar:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58>
  89.96s	Building docker image <http://registry.example.com/path/to/bazqux:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58|registry.example.com/path/to/bazqux:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58>
21:17:33.80 [INFO] Completed: Building docker image <http://registry.example.com/path/to/bazqux:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58|registry.example.com/path/to/bazqux:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58>
21:17:33.80 [INFO] Built docker image: <http://registry.example.com/path/to/bazqux:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58|registry.example.com/path/to/bazqux:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58>
21:17:33.94 [INFO] Completed: Building docker image <http://registry.example.com/path/to/foobar:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58|registry.example.com/path/to/foobar:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58>
21:17:33.94 [INFO] Built docker image: <http://registry.example.com/path/to/foobar:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58|registry.example.com/path/to/foobar:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58>
21:17:34.35 [INFO] Long running tasks:
  120.01s	Building docker image <http://registry.example.com/path/to/example:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58|registry.example.com/path/to/example:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58>
21:17:34.58 [INFO] Completed: Building docker image <http://registry.example.com/path/to/example:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58|registry.example.com/path/to/example:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58>
21:17:34.59 [INFO] Built docker image: <http://registry.example.com/path/to/example:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58|registry.example.com/path/to/example:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58>
The push refers to repository [<http://registry.example.com/path/to/bazqux|registry.example.com/path/to/bazqux>]
63a023c9fec0: Preparing
b685cdb4737b: Preparing
0b031aac6569: Preparing
0b031aac6569: Layer already exists
b685cdb4737b: Layer already exists
63a023c9fec0: Pushed
a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58: digest: sha256:1d0df33481f7cfdc6e4f63cffb57ac1c5de94cada9765543d3371727d611b1dc size: 949
The push refers to repository [<http://registry.example.com/path/to/example|registry.example.com/path/to/example>]
An image does not exist locally with the tag: <http://registry.example.com/path/to/example|registry.example.com/path/to/example>
The push refers to repository [<http://registry.example.com/path/to/foobar|registry.example.com/path/to/foobar>]
7cc422c5de59: Preparing
b685cdb4737b: Preparing
0b031aac6569: Preparing
0b031aac6569: Layer already exists
b685cdb4737b: Layer already exists
7cc422c5de59: Pushed
a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58: digest: sha256:72dfdba86171557dc49c8ada19d3b158b078917edf055fa996d4f1b0c2a7e3f4 size: 949
✓ <http://registry.example.com/path/to/bazqux:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58|registry.example.com/path/to/bazqux:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58> published.
✕ <http://registry.example.com/path/to/example:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58|registry.example.com/path/to/example:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58> failed.
✓ <http://registry.example.com/path/to/foobar:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58|registry.example.com/path/to/foobar:a2cb4ea1e421e7a6d2b899d6b93be53b2501ed58> published.
When re-running this job it can either succeed or fail again, but if it fails it's not necessarily the same Docker image failing again. It's not a lot to go by out of the box. I'll see if I can get enable some more debug-worthy logs during office hours tomorrow, but I thought I'd at least mention it in case someone else has had similar issues. Perhaps related to the remote caching issue that was fixed the other week that was also seemingly happening at random?
c
By the looks of it, the failing image does not exist however it claims to have been built. Can you verify if it does exist or not using
docker images
? Also, if you run with
./pants -ldebug
we get the build output from Docker in the log, in case there’s a hint from that step. (if you’re on a recent enough version of Pants, I think 2.11.x is recent enough, can double check if needed)
Is it flaky when running locally as well, or during CI only? Thinking there may be some automatic image cleaning service on a build slave..?
n
Thanks! We're on 2.11.0. The only automatic image cleaning I'm aware of is a cronjob that runs
docker image prune
at 03:00 every night, so that shouldn't be the cause since the issue could happen at any point. I've set
PANTS_LEVEL
to
debug
now, so I'll get back with more logs whenever the issue happens again.
👍 1
c
And locally?
n
I ran it locally a few times just now and I can't get it to fail, so might be something going on in the CI env then.
c
Ok, will be interesting to learn what you find in the debug logs, then. 🙂
n
Debug logs does not make me any wiser.
Copy code
11:13:32.46 [INFO] Completed: Building docker image <http://registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca|registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca>
11:13:32.47 [DEBUG] Completed: Scheduling: Building docker image <http://registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca|registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca>
11:13:32.47 [DEBUG] Docker build output for <http://registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca|registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca>:
stdout:
...
Successfully tagged <http://registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca|registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca>
stderr:
11:13:32.47 [INFO] Built docker image: <http://registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca|registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca>
11:13:32.47 [DEBUG] Execute InteractiveProcess(argv=('/usr/bin/docker', 'push', '<http://registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca|registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca>'), env=FrozenDict({'DOCKER_CONFIG': '/home/gitlab-runner/gitlab-runner-docker-3', 'DOCKER_HOST': '<tcp://dockerdaemon:2375/'}>), input_digest=Digest('e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855', 0), run_in_workspace=False, forward_signals_to_process=True, restartable=False, append_only_caches=FrozenDict({}))
The push refers to repository [<http://registry.example.com/path/to/image|registry.example.com/path/to/image>]
An image does not exist locally with the tag: <http://registry.example.com/path/to/image|registry.example.com/path/to/image>
✕ <http://registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca|registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca> failed.
11:13:32.54 [DEBUG] Publish result data:
[
  {
    "exit_code": 1,
    "names": [
      "<http://registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca|registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca>"
    ],
    "published": false,
    "publisher": "docker",
    "registries": [
      "<all default registries>"
    ],
    "status": "failed",
    "target": "path/to/project:docker"
  }
]
11:13:32.54 [DEBUG] Completed: `publish` goal
11:13:32.54 [DEBUG] computed 1 nodes in 19.843418 seconds. there are 8857 total nodes.
Retrying the job makes it successful:
Copy code
11:21:35.54 [INFO] Completed: Building docker image <http://registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca|registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca>
11:21:35.54 [DEBUG] Completed: Scheduling: Building docker image <http://registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca|registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca>
11:21:35.54 [DEBUG] Docker build output for <http://registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca|registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca>:
stdout:
...
Successfully tagged <http://registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca|registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca>
stderr:
11:21:35.54 [INFO] Built docker image: <http://registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca|registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca>
11:21:35.54 [DEBUG] Execute InteractiveProcess(argv=('/usr/bin/docker', 'push', '<http://registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca|registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca>'), env=FrozenDict({'DOCKER_CONFIG': '/home/gitlab-runner/gitlab-runner-docker-3', 'DOCKER_HOST': '<tcp://dockerdaemon:2375/'}>), input_digest=Digest('e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855', 0), run_in_workspace=False, forward_signals_to_process=True, restartable=False, append_only_caches=FrozenDict({}))
The push refers to repository [<http://registry.example.com/path/to/image|registry.example.com/path/to/image>]
dc9d3244985f: Preparing
7c4ed18ffc11: Preparing
f99c2bd0bdfc: Preparing
8bcf066bab18: Preparing
7fce09c1d950: Preparing
1401df2b50d5: Preparing
1401df2b50d5: Waiting
7fce09c1d950: Layer already exists
f99c2bd0bdfc: Layer already exists
8bcf066bab18: Layer already exists
7c4ed18ffc11: Layer already exists
1401df2b50d5: Layer already exists
dc9d3244985f: Pushed
f975c63434f3e1f8317bb97c6f03ead8a72f7fca: digest: sha256:8567b4435be7e26dc530bf7dc03490ff23c630ee8060f7492b674f235fa22e9a size: 1583
✓ <http://registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca|registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca> published.
11:21:41.99 [DEBUG] Publish result data:
[
  {
    "exit_code": 0,
    "names": [
      "<http://registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca|registry.example.com/path/to/image:f975c63434f3e1f8317bb97c6f03ead8a72f7fca>"
    ],
    "published": true,
    "publisher": "docker",
    "registries": [
      "<all default registries>"
    ],
    "status": "published",
    "target": "path/to/project:docker"
  }
]
11:21:41.99 [DEBUG] Completed: `publish` goal
11:21:41.99 [DEBUG] computed 1 nodes in 18.927277 seconds. there are 8857 total nodes.
c
Hmm.. agreed. I don’t see any hints in there either. Next idea is more intrusive, and that would be to monitor what the docker daemon is up to, by listening on the
docker events
stream to see if there’s anything removing the just built image, or what.