I m trying to understand the purpose behaviour of `~ cache p Pants #general

I'm trying to understand the purpose/behaviour of ...

broad-processor-92400

10/20/2022, 1:55 AM

I'm trying to understand the purpose/behaviour of

~/.cache/pants/named_caches

. Based on https://www.pantsbuild.org/docs/using-pants-in-ci#directories-to-cache, there's suggestion that it should cache tools/PEXs. However, if I run a CI build with a full cache hit (key = hash of all lock files) on

named_caches

and an empty

lmdb_store

, I still see a lot of lines like the following (plus corresponding

Completed

): For lint/check (ends up being ~20s of wall time spent building the pexes):

Copy code

Starting: Building isort.pex from isort_default.lock
...
Starting: Building extra_type_stubs.pex
Starting: Building mypy.pex from 3rdparty/python/mypy.lock
Starting: Building 59 requirements for requirements.pex from the 3rdparty/python/default.lock resolve: ...
Starting: Building requirements_venv.pex

For testing (ends up being 2.5 minutes of wall time building pexes):

Copy code

Starting: Building 42 requirements for requirements.pex from the 3rdparty/python/default.lock resolve: ...
...
Starting: Building pytest_runner.pex
...

I'd be expecting that the ~2GB in

named_caches

would eliminate the 3 minutes of building all these pexes, even if nothing else is cached, but that's apparently not the behaviour. Is there something I may be doing wrong? (A few more details in the thread)

broad-processor-92400

10/20/2022, 2:02 AM

• I cannot see any significant difference in timing or output between the build above, or one with an empty

named_caches

too (i.e. fresh

named_caches

and

lmdb_store

) • Cache key:

Copy code

pants-named-${{ runner.os }}-${{ hashFiles('pants.toml') }}-${{ hashFiles('3rdparty/**/*.lock') }}

• The output above is from commands like

./pants lint check ::

and

./pants test ::

. • The output from

[stats].log = true

includes info like (

comments mine):

Copy code

Counters:
  ...
  local_cache_requests_cached: 0 # due to empty lmdb_store, presumably
  ...
local_store_read_blob_size
  ...
  sum: 14329918098 # 14GB suggests that _some_ cache is being used, but a build with empty named_caches has a similar number

• pants version: 2.13.0

witty-crayon-22786

10/20/2022, 3:20 AM

the primary interesting named cache is for PEX… and the primary interesting subcomponent of that is of built wheels

witty-crayon-22786

10/20/2022, 3:21 AM

if your build uses mostly pre-built wheels fetched from the network, it’s possible that re-fetching them is cheaper for you.

witty-crayon-22786

10/20/2022, 3:22 AM

the lmdb store caches complete process executions: you either completely hit or completely miss for each process

witty-crayon-22786

10/20/2022, 3:22 AM

the named caches can “partially” accelerate something like a python resolve process, by skipping some individual steps inside the process (such as wheel builds)

broad-processor-92400

10/20/2022, 3:24 AM

Hm, okay. I think our external dependencies are entirely bdist/wheels, so that seems to match with what you say

broad-processor-92400

10/20/2022, 3:25 AM

I think this suggests that it'd be better for our CI to just not cache

named_caches

at all, because the time spent restoring/saving it seems to be more than it saves

broad-processor-92400

10/20/2022, 9:40 PM

Oh, I forgot to thank you for your insight yesterday... sorry and thank you! To confirm, the behaviour is that the pex files referenced in these

Starting: Building ... Completed: Building ...

lines are cached in

lmdb_store

(or in a remote cache, if using), rather than

named_caches

, even the tool ones (using default lockfiles) like black/isort?

witty-crayon-22786

10/20/2022, 9:41 PM

that’s correct: you can hit/miss for the entire process using the

lmdb_store

(local) or remote cache

👍 1

cold-sugar-54376

10/24/2022, 7:50 PM

@broad-processor-92400 were you able to have luck just by not caching

named_caches

and only caching the

lmdb_store

broad-processor-92400

10/24/2022, 9:18 PM

yep; I've removed the named_caches cache, and that's saving us 30-45s in each CI build.

witty-crayon-22786

10/24/2022, 10:12 PM

mm: https://github.com/pantsbuild/pants/issues/14364 is also relevant here.

👍 1

cold-sugar-54376

10/25/2022, 9:03 PM

what keys do you end up caching on for lmdb store?

broad-processor-92400

10/25/2022, 9:17 PM

Our key is

pants-lmdb-${{ runner.os }}-${{ hashFiles('pants.toml') }}-${{ hashFiles('**/BUILD') }}-${{ hashFiles('subdir/**/*') }}

and then we restore from truncated versions of that (we're only using pants in

subdir

at the moment). We're hoping to switch to a remote cache soon-ish though.

2 Views

Open in Slack

Previous Next