I'm trying to understand the purpose/behaviour of ...
# general
b
I'm trying to understand the purpose/behaviour of
~/.cache/pants/named_caches
. Based on https://www.pantsbuild.org/docs/using-pants-in-ci#directories-to-cache, there's suggestion that it should cache tools/PEXs. However, if I run a CI build with a full cache hit (key = hash of all lock files) on
named_caches
and an empty
lmdb_store
, I still see a lot of lines like the following (plus corresponding
Completed
): For lint/check (ends up being ~20s of wall time spent building the pexes):
Copy code
Starting: Building isort.pex from isort_default.lock
...
Starting: Building extra_type_stubs.pex
Starting: Building mypy.pex from 3rdparty/python/mypy.lock
Starting: Building 59 requirements for requirements.pex from the 3rdparty/python/default.lock resolve: ...
Starting: Building requirements_venv.pex
For testing (ends up being 2.5 minutes of wall time building pexes):
Copy code
Starting: Building 42 requirements for requirements.pex from the 3rdparty/python/default.lock resolve: ...
...
Starting: Building pytest_runner.pex
...
I'd be expecting that the ~2GB in
named_caches
would eliminate the 3 minutes of building all these pexes, even if nothing else is cached, but that's apparently not the behaviour. Is there something I may be doing wrong? (A few more details in the thread)
• I cannot see any significant difference in timing or output between the build above, or one with an empty
named_caches
too (i.e. fresh
named_caches
and
lmdb_store
) • Cache key:
Copy code
pants-named-${{ runner.os }}-${{ hashFiles('pants.toml') }}-${{ hashFiles('3rdparty/**/*.lock') }}
• The output above is from commands like
./pants lint check ::
and
./pants test ::
. • The output from
[stats].log = true
includes info like (
#
comments mine):
Copy code
Counters:
  ...
  local_cache_requests_cached: 0 # due to empty lmdb_store, presumably
  ...
local_store_read_blob_size
  ...
  sum: 14329918098 # 14GB suggests that _some_ cache is being used, but a build with empty named_caches has a similar number
• pants version: 2.13.0
w
the primary interesting named cache is for PEX… and the primary interesting subcomponent of that is of built wheels
if your build uses mostly pre-built wheels fetched from the network, it’s possible that re-fetching them is cheaper for you.
the lmdb store caches complete process executions: you either completely hit or completely miss for each process
the named caches can “partially” accelerate something like a python resolve process, by skipping some individual steps inside the process (such as wheel builds)
b
Hm, okay. I think our external dependencies are entirely bdist/wheels, so that seems to match with what you say
I think this suggests that it'd be better for our CI to just not cache
named_caches
at all, because the time spent restoring/saving it seems to be more than it saves
Oh, I forgot to thank you for your insight yesterday... sorry and thank you! To confirm, the behaviour is that the pex files referenced in these
Starting: Building ... Completed: Building ...
lines are cached in
lmdb_store
(or in a remote cache, if using), rather than
named_caches
, even the tool ones (using default lockfiles) like black/isort?
w
that’s correct: you can hit/miss for the entire process using the
lmdb_store
(local) or remote cache
👍 1
c
@broad-processor-92400 were you able to have luck just by not caching
named_caches
and only caching the
lmdb_store
?
b
yep; I've removed the named_caches cache, and that's saving us 30-45s in each CI build.
w
👍 1
c
what keys do you end up caching on for lmdb store?
b
Our key is
pants-lmdb-${{ runner.os }}-${{ hashFiles('pants.toml') }}-${{ hashFiles('**/BUILD') }}-${{ hashFiles('subdir/**/*') }}
and then we restore from truncated versions of that (we're only using pants in
subdir
at the moment). We're hoping to switch to a remote cache soon-ish though.