Trying to debug pants cache in docker Was wondering if I can Pants #general

Trying to debug pants cache in docker. Was wonderi...

cuddly-flag-68085

04/29/2021, 8:39 PM

Trying to debug pants cache in docker. Was wondering if I can get some help? 🙏 We are trying to build our pants jar inside of a docker instance running on ubuntu. In order to speed up the build, I was hoping to persist the pants cache for future builds. It seems like the right way to do that is this:

ENV XDG_CACHE_HOME /root/.cache

(src)

RUN --mount=type=cache,target=/root/.cache/ ./pants binary some/jar/to/build:

But all subsequent docker runs dont use any cached data, and the build starts from scratch. Are there any examples of pants + docker + caching that are known to work?

cuddly-flag-68085

04/29/2021, 9:54 PM

I can confirm that running this shows pants cached data totaling a few Gigs, yet pants never reads from the cache

RUN --mount=type=cache,target=/root/.cache/ du --max-depth=1 -h /root/.cache

hundreds-father-404

04/29/2021, 10:14 PM

Hey @cuddly-flag-68085, I'm wondering if playing with where the cache directory is might make a difference: https://www.pantsbuild.org/docs/troubleshooting#how-to-change-your-cache-directory

cuddly-flag-68085

04/29/2021, 10:26 PM

I tried that but no luck. It seems to be writing the cache data to the right place. I wonder is it possible the cache key is not deterministic in this environment. Do you happen to know where the pants cache key construction code is?

hundreds-father-404

04/29/2021, 10:33 PM

Hm, I'd be surprised if the environment is causing the cache key to not be stable - you're using the same Docker image each time I take it The cache keys are computed automatically through the Rules API - tl;dr, Pants splits up the steps of your builds into small parts, which are modeled by "rules". Rules are pure functions that take certain input - if the inputs are the same, the cache can be used. (The engine handles impure things like reading from the file system). See https://www.pantsbuild.org/docs/rules-api-tips#fyi-caching-semantics for more info The only thing I could imagine messing up the cache is if the OS is changing because some processes depend on the platform. But I doubt you're doing that One other thing to check is to compare the output of

env

in your terminal before and after, in case certain env vars are changing? (Altho Pants discards most env vars for hermiticity)

happy-kitchen-89482

04/29/2021, 11:17 PM

Which version of Pants are you on?

happy-kitchen-89482

04/29/2021, 11:18 PM

I'm assuming 1.30.x?

👍 1

happy-kitchen-89482

04/29/2021, 11:18 PM

@hundreds-father-404 I think your reply was assuming 2.x?

cuddly-flag-68085

04/29/2021, 11:58 PM

We're using 1.26.0.

cuddly-flag-68085

04/30/2021, 12:06 AM

• I'm going to look more into the exact docker command we're using. It's a little obfuscated by some wrappers we're using. It does seem like each invocation uses a unique --build-arg=GIT_SHA, build-arg=GIT_COMMIT, although there are no actual code changes between runs. • The OS/platform should be constant. • Within docker and also in the host machine the env variables are constant.

cuddly-flag-68085

04/30/2021, 12:07 AM

If I have two independent "RUN /.pants some:thing" in the same Dockerfile, will this also use the in-memory pants cache?

5 Views

Open in Slack

Previous Next