I desperately want to use Pants to build my Docker images bu Pants #general

I desperately want to use Pants to build my Docker...

acoustic-library-86413

07/28/2023, 1:49 PM

I desperately want to use Pants to build my Docker images, but the performance I'm seeing when building multiple images concurrently is abhorrent. Building 17 (relatively small and simple) images takes well over 4 minutes. Is there a trick to speeding up concurrent performance significantly? I'm using

PANTS_CONCURRENT=True pants package /path/to/Dockerfile

as my command. I have also tried setting

PANTS_PANTSD=False

to see if that made a difference, but it did not. This is a brand new laptop (X1 Carbon) with 16 cores and 32Gb of RAM, so it should be more than up to par for the task.

fresh-cat-90827

07/28/2023, 2:08 PM

hey and thanks for reaching out! The

PANTS_CONCURRENT

is used to control how multiple invocations of Pants start i.e. when you run

pants test ::

and

pants lint ::

in two different terminals.

fresh-cat-90827

07/28/2023, 2:11 PM

this option doesn't control concurrency of the packaging operations. Not a Docker packaging expert, but when you say you have 17 images, do you mean it's 17 Dockerfiles and you have a

docker_image

target for each of the files? When you do

pants package ::

, Pants runs operations in parallel and I believe building Docker images is safe to build in parallel? In other words, is it safe to run

docker build -f first.Dockerfile

and

docker build -f second.Dockerfile

each in own terminal?

fresh-cat-90827

07/28/2023, 2:12 PM

if that's the case, then you when run

pants package ::

, your Docker images should be built in parallel

acoustic-library-86413

07/28/2023, 2:16 PM

That's very interesting. Building with

pants package ::

does reduce the time to 20 seconds. However, I may still be at a loss, given that I'm using Tilt as an intermediary to rebuild images on code changes. This expects a

command

as such, which is why it's trying to run 17 instances of Pants concurrently:

Copy code

custom_build(
    ref="my-service",
    command="pants package projects/my-service/Dockerfile",
    deps=["projects/my-service"]
)

Maybe I'm able to trick it though.

flat-zoo-31952

07/28/2023, 2:20 PM

you could use

pants --loop package ::

to accomplish the same thing

flat-zoo-31952

07/28/2023, 2:21 PM

oh but it looks like tilt does automatic redploy

acoustic-library-86413

07/28/2023, 2:22 PM

Yeah, that's a good point, Josh. However, if I'm not mistaken then multiple

pants package ::

should in theory be completely memoized after the first run. So I can probably substitute each command with that and rely on the caching mechanisms in pants.

fresh-cat-90827

07/28/2023, 2:23 PM

oh you'd need to explore a bit how

PANTS_CONCURRENT

works as without this enabled, Pants will start up all concurrent invocations (e.g. in other terminals) without

pantsd

fresh-cat-90827

07/28/2023, 2:23 PM

pantsd

- no memoization (but still local/remote cache)

fresh-cat-90827

07/28/2023, 2:24 PM

it would be great to be able to have a single pants call, if that's at all possible, imho

👍 1

flat-zoo-31952

07/28/2023, 2:24 PM

I will say this... Pants does not play well with tools that try to run it concurrently. There's some limited support for it with

--concurrent

but I wouldn't say this is a well-trodden path. Pants is pretty picky about producing deterministic and reproducible results from your source, so it kinda wants to be the tool that is orchestrating the actions of your builds based on changes

⬆️ 1

👍 1

flat-zoo-31952

07/28/2023, 2:25 PM

it's possible that

Tilt

could be integrated as a tool that can be

published

acoustic-library-86413

07/28/2023, 2:25 PM

Yeah, I think you're both pretty on point. It would be incredible to have better support, I see that a lot of effort has been made in https://github.com/pantsbuild/pants/issues/7654, but I'm unsure if this is strictly related to not specifying

PANTS_CONCURRENT

or if it also brings improved concurrent performance.

acoustic-library-86413

07/28/2023, 2:26 PM

Maybe I can hack it together by having a Tilt goal that runs

pants --loop package ::

and using

deps

in Tilt to have the service be redeployed on changes. That way pants will only be invoked once. Regardless, running with

::

was a super helpful tip, thanks!

acoustic-library-86413

07/28/2023, 2:28 PM

Hmm, I have not looked into

pants publish

at all, I'll have to give it a look. That could possibly work as well. That way I could have something running

pants publish

to push to my local registry, and have Tilt simply use this pre-built image and again invalidating manually for local development using

deps

fresh-cat-90827

07/28/2023, 2:30 PM

oh so you have developed a custom plugin for this? Nice.

Regardless, running with
::
was a super helpful tip, thanks!

you are very welcome. If you ever need to tune what targets you want to pass to a goal, you may find this https://www.pantsbuild.org/docs/goals#goal-arguments helpful, e.g. you can ignore targets with

as well.

flat-zoo-31952

07/28/2023, 2:31 PM

hmm if publish would just echo the checksum of the image on publish Tilt could be told to watch the checksum file for changes and sync/redeploy on that with no need for manual invalidation

acoustic-library-86413

07/28/2023, 2:42 PM

I'll try to whip up a working example. I think I can make do with just

pants package

to be honest. If I get it working I'll write up a blog post about the entire process. The end goal here is to have a development environment that is identical to the one used in staging and production, with the exception of environment variables and secrets, whilst still maintaining features like hot reloading on local dev. I've had it working superbly for a while now in a non-monorepo context, so this is really just the missing piece for what I believe would be a great developer experience.

🔥 2

fresh-cat-90827

07/28/2023, 2:49 PM

ok cool, glad to hear you have an idea what you could try!

If I get it working I'll write up a blog post about the entire process.

sounds awesome, feel free to share any drafts, happy to help with proofreading/formatting/etc if desired 🙂

❤️ 1

acoustic-library-86413

07/28/2023, 3:07 PM

I find it strange that multiple calls to

pants package ::

without making any changes to any file isn't being memoized. I would assume images whose dependencies have not been altered to be cached.

acoustic-library-86413

07/28/2023, 3:07 PM

Are you guys observing the same behavior on your end?

happy-kitchen-89482

07/28/2023, 6:24 PM

Normally this would be the case - e.g., if

package

was building a .pex file or something. But Docker images are cached externally to Pants, by the Docker daemon.

happy-kitchen-89482

07/28/2023, 6:25 PM

docker build

doesn't produce an image file that Pants can cache, it produces a side effect in its own cache.

happy-kitchen-89482

07/28/2023, 6:25 PM

So imagine if you wiped the local docker cache, and then Pants assumed it didn't need to re-build

👍 1

happy-kitchen-89482

07/28/2023, 6:25 PM

you could never reintroduce that image!

acoustic-library-86413

07/28/2023, 6:25 PM

Fair points!

happy-kitchen-89482

07/28/2023, 6:26 PM

It would be far better if

docker

produced some artifact that Pants could internally cache

happy-kitchen-89482

07/28/2023, 6:26 PM

but it does not, to my knowledge

happy-kitchen-89482

07/28/2023, 6:26 PM

And working with external state like that is tricky...

flat-zoo-31952

07/28/2023, 6:28 PM

you could get the image ID from the build process and run

docker info

on it to poll if it still is there

➕ 1

acoustic-library-86413

07/28/2023, 6:28 PM

The tricky part here is that I could run

pants --loop package ::

in a background process, and have tilt bring up the stack. I could then set up file watchers to have it re-deploy on file-changes, but since they are now indepedentent, Tilt would have no way of knowing if the image has been rebuilt yet.

flat-zoo-31952

07/28/2023, 6:29 PM

but yeah... it could technically disappear within the same pants run/session

happy-kitchen-89482

07/28/2023, 6:30 PM

Well, we can't protect against that race ever

acoustic-library-86413

07/28/2023, 6:30 PM

I'd love to hear more about how you guys are doing local development and leverage features such as hot reloading, in repositories that use Pants.

happy-kitchen-89482

07/28/2023, 6:30 PM

So maybe that

docker info

trick would work

flat-zoo-31952

07/28/2023, 6:31 PM

You might be able to protect against that race if you used

podman

instead of docker and configured it to keep all of its state in directories pants knows about and can cache, but that would be a ton of work for something that would only work on linux

flat-zoo-31952

07/28/2023, 6:33 PM

Podman is daemonless and its state is defined entirely by its cache/config/and running processes, so it is theoretically able to integrate with pants as proper tool

acoustic-library-86413

07/28/2023, 6:33 PM

That's very interesting Josh, I didn't even know Podman could be used without Kubernetes, but I see that Podman Compose exists.

flat-zoo-31952

07/28/2023, 6:34 PM

I'm not sure that using Podman like this would even be practical

flat-zoo-31952

07/28/2023, 6:35 PM

And it doesn't help your immediate problem, it's just me musing

acoustic-library-86413

07/28/2023, 6:36 PM

I wonder how difficult it would be to create a plugin that would skip (effectively memoize) the Docker build command based on the

docker info

as you mentioned.

flat-zoo-31952

07/28/2023, 6:37 PM

For now, Pants treats most if not all calls to docker commands as being uncacheable in Pants itself, and that probably won't change for a bit (although that

docker info

hack might be "okay" for many use cases). I doubt a plugin could do that though because it the non-cacheable nature is probably specified on the rule get itself

👍 1

flat-zoo-31952

07/28/2023, 6:37 PM

https://github.com/pantsbuild/pants/issues

flat-zoo-31952

07/28/2023, 6:39 PM

https://github.com/pantsbuild/pants/blob/e394c8ca37097daf5cefc60af76ff379da5389b6/src/python/pants/backend/docker/util_rules/docker_binary.py#L91 is what says we have to run docker build once per session

flat-zoo-31952

07/28/2023, 6:40 PM

"session" in this context means one invocation of Pants, or one loop of

pants --loop

acoustic-library-86413

07/28/2023, 6:46 PM

Yeah, I came across https://github.com/pantsbuild/pants/issues/14657 which I guess would also solve, or go a long way towards solving the caching of docker artifacts. I guess for now I'll: 1. Run

pants -loop package projects/::

in a separate process 2. Allow

Tilt

to start the services, display the logs, status etc. for my developers 3. Integrate a button on each service into the Tilt UI to restart it on demand with the latest image This isn't quite as good as native hot reloading, but it should be far less taxing on the system, enables us to have faster startup times, and at least the containers will build automatically.

acoustic-library-86413

07/28/2023, 6:49 PM

But as I said, Josh, I'd love to hear more about how you guys are doing local development with Pants over at Aiven. Since what I'm trying to do is difficult to achieve at the moment, I'm willing to bet there's a better way.

flat-zoo-31952

07/28/2023, 7:00 PM

Our colleague that’s working with the docker integration is on vacation but when he gets back we can share more

❤️ 1

acoustic-library-86413

07/29/2023, 8:39 PM

For now I've ended up integrating a button directly into the Tilt UI that calls a Python script. This script simply calls out to

pants package

followed by

tilt trigger <service>

to trigger a restart once the image has been built. It doesn't work as well as true hot reloading and file-watchers would, but at least it's just one click and relatively fast.

2 Views

Open in Slack

Previous Next