I desperately want to use Pants to build my Docker...
# general
a
I desperately want to use Pants to build my Docker images, but the performance I'm seeing when building multiple images concurrently is abhorrent. Building 17 (relatively small and simple) images takes well over 4 minutes. Is there a trick to speeding up concurrent performance significantly? I'm using
PANTS_CONCURRENT=True pants package /path/to/Dockerfile
as my command. I have also tried setting
PANTS_PANTSD=False
to see if that made a difference, but it did not. This is a brand new laptop (X1 Carbon) with 16 cores and 32Gb of RAM, so it should be more than up to par for the task.
f
hey and thanks for reaching out! The
PANTS_CONCURRENT
is used to control how multiple invocations of Pants start i.e. when you run
pants test ::
and
pants lint ::
in two different terminals.
this option doesn't control concurrency of the packaging operations. Not a Docker packaging expert, but when you say you have 17 images, do you mean it's 17 Dockerfiles and you have a
docker_image
target for each of the files? When you do
pants package ::
, Pants runs operations in parallel and I believe building Docker images is safe to build in parallel? In other words, is it safe to run
docker build -f first.Dockerfile
and
docker build -f second.Dockerfile
each in own terminal?
if that's the case, then you when run
pants package ::
, your Docker images should be built in parallel
a
That's very interesting. Building with
pants package ::
does reduce the time to 20 seconds. However, I may still be at a loss, given that I'm using Tilt as an intermediary to rebuild images on code changes. This expects a
command
as such, which is why it's trying to run 17 instances of Pants concurrently:
Copy code
custom_build(
    ref="my-service",
    command="pants package projects/my-service/Dockerfile",
    deps=["projects/my-service"]
)
Maybe I'm able to trick it though.
f
you could use
pants --loop package ::
to accomplish the same thing
oh but it looks like tilt does automatic redploy
a
Yeah, that's a good point, Josh. However, if I'm not mistaken then multiple
pants package ::
should in theory be completely memoized after the first run. So I can probably substitute each command with that and rely on the caching mechanisms in pants.
f
oh you'd need to explore a bit how
PANTS_CONCURRENT
works as without this enabled, Pants will start up all concurrent invocations (e.g. in other terminals) without
pantsd
no
pantsd
- no memoization (but still local/remote cache)
it would be great to be able to have a single pants call, if that's at all possible, imho
πŸ‘ 1
f
I will say this... Pants does not play well with tools that try to run it concurrently. There's some limited support for it with
--concurrent
but I wouldn't say this is a well-trodden path. Pants is pretty picky about producing deterministic and reproducible results from your source, so it kinda wants to be the tool that is orchestrating the actions of your builds based on changes
⬆️ 1
πŸ‘ 1
it's possible that
Tilt
could be integrated as a tool that can be
published
?
a
Yeah, I think you're both pretty on point. It would be incredible to have better support, I see that a lot of effort has been made in https://github.com/pantsbuild/pants/issues/7654, but I'm unsure if this is strictly related to not specifying
PANTS_CONCURRENT
or if it also brings improved concurrent performance.
Maybe I can hack it together by having a Tilt goal that runs
pants --loop package ::
and using
deps
in Tilt to have the service be redeployed on changes. That way pants will only be invoked once. Regardless, running with
::
was a super helpful tip, thanks!
Hmm, I have not looked into
pants publish
at all, I'll have to give it a look. That could possibly work as well. That way I could have something running
pants publish
to push to my local registry, and have Tilt simply use this pre-built image and again invalidating manually for local development using
deps
.
f
oh so you have developed a custom plugin for this? Nice.
Regardless, running with
::
was a super helpful tip, thanks!
you are very welcome. If you ever need to tune what targets you want to pass to a goal, you may find this https://www.pantsbuild.org/docs/goals#goal-arguments helpful, e.g. you can ignore targets with
-
as well.
f
hmm if publish would just echo the checksum of the image on publish Tilt could be told to watch the checksum file for changes and sync/redeploy on that with no need for manual invalidation
a
I'll try to whip up a working example. I think I can make do with just
pants package
to be honest. If I get it working I'll write up a blog post about the entire process. The end goal here is to have a development environment that is identical to the one used in staging and production, with the exception of environment variables and secrets, whilst still maintaining features like hot reloading on local dev. I've had it working superbly for a while now in a non-monorepo context, so this is really just the missing piece for what I believe would be a great developer experience.
πŸ”₯ 2
f
ok cool, glad to hear you have an idea what you could try!
If I get it working I'll write up a blog post about the entire process.
sounds awesome, feel free to share any drafts, happy to help with proofreading/formatting/etc if desired πŸ™‚
❀️ 1
a
I find it strange that multiple calls to
pants package ::
without making any changes to any file isn't being memoized. I would assume images whose dependencies have not been altered to be cached.
Are you guys observing the same behavior on your end?
h
Normally this would be the case - e.g., if
package
was building a .pex file or something. But Docker images are cached externally to Pants, by the Docker daemon.
docker build
doesn't produce an image file that Pants can cache, it produces a side effect in its own cache.
So imagine if you wiped the local docker cache, and then Pants assumed it didn't need to re-build
πŸ‘ 1
you could never reintroduce that image!
a
Fair points!
h
It would be far better if
docker
produced some artifact that Pants could internally cache
but it does not, to my knowledge
And working with external state like that is tricky...
f
you could get the image ID from the build process and run
docker info
on it to poll if it still is there
βž• 1
a
The tricky part here is that I could run
pants --loop package ::
in a background process, and have tilt bring up the stack. I could then set up file watchers to have it re-deploy on file-changes, but since they are now indepedentent, Tilt would have no way of knowing if the image has been rebuilt yet.
f
but yeah... it could technically disappear within the same pants run/session
h
Well, we can't protect against that race ever
a
I'd love to hear more about how you guys are doing local development and leverage features such as hot reloading, in repositories that use Pants.
h
So maybe that
docker info
trick would work
f
You might be able to protect against that race if you used
podman
instead of docker and configured it to keep all of its state in directories pants knows about and can cache, but that would be a ton of work for something that would only work on linux
Podman is daemonless and its state is defined entirely by its cache/config/and running processes, so it is theoretically able to integrate with pants as proper tool
a
That's very interesting Josh, I didn't even know Podman could be used without Kubernetes, but I see that Podman Compose exists.
f
I'm not sure that using Podman like this would even be practical
And it doesn't help your immediate problem, it's just me musing
a
I wonder how difficult it would be to create a plugin that would skip (effectively memoize) the Docker build command based on the
docker info
as you mentioned.
f
For now, Pants treats most if not all calls to docker commands as being uncacheable in Pants itself, and that probably won't change for a bit (although that
docker info
hack might be "okay" for many use cases). I doubt a plugin could do that though because it the non-cacheable nature is probably specified on the rule get itself
πŸ‘ 1
"session" in this context means one invocation of Pants, or one loop of
pants --loop
a
Yeah, I came across https://github.com/pantsbuild/pants/issues/14657 which I guess would also solve, or go a long way towards solving the caching of docker artifacts. I guess for now I'll: 1. Run
pants -loop package projects/::
in a separate process 2. Allow
Tilt
to start the services, display the logs, status etc. for my developers 3. Integrate a button on each service into the Tilt UI to restart it on demand with the latest image This isn't quite as good as native hot reloading, but it should be far less taxing on the system, enables us to have faster startup times, and at least the containers will build automatically.
But as I said, Josh, I'd love to hear more about how you guys are doing local development with Pants over at Aiven. Since what I'm trying to do is difficult to achieve at the moment, I'm willing to bet there's a better way.
f
Our colleague that’s working with the docker integration is on vacation but when he gets back we can share more
❀️ 1
a
For now I've ended up integrating a button directly into the Tilt UI that calls a Python script. This script simply calls out to
pants package
followed by
tilt trigger <service>
to trigger a restart once the image has been built. It doesn't work as well as true hot reloading and file-watchers would, but at least it's just one click and relatively fast.