I'm looking the size of our Pants named cache in C...
# general
w
I'm looking the size of our Pants named cache in CI (using
actions/init-pants
), as the total size is 10GB+ A lot of that is down to some heavy dependencies on our part (
pytorch
,
opencv
, etc), but inspecting the directorys shows a lot of the size comes from
pip
and
downloads
, both of which seem to be caches themselves. The pip directory is full of examples like
592M	/home/runner/.cache/pants/named_caches/pex_root/pip/24.0/pip_cache/http-v2/5/f/8/1/6
, while downloads is similarly opaque, with a lot of /<SHA> directories, also containing package caches. My understanding of the named cache is that if the
named-caches-hash
changed (eg, we alter a lockfile in our repo), then the old cache becomes invalid and GH cache action will prefix match the stale cache and presumably re-use part of the cache to build the new cache. But the cache seems to include 3 versions of dependencies, in
installed_wheels
,
pip
, and
downloads
. Suspiciously the sizes of the last two are very similar in our cache. Is there duplication here and can some of these sub caches be removed to reduce the overall cache size?
w
This goes to a few discussions we've had about a
prune
or
clean
option for caches - a while back I was trying to figure out, practically speaking, what is prunable from there. In general, pex is probably one of the more resilient tools to cache management I had a hypothesis (as you've suggested here) that we could probably wipe out a whole swathe of files and still be okay. What I don't recall is how the cache backtracking occurs in pex. e.g. Does it assume "download" (oh, it's there) -> "install" (oh it's there) -> use Or is it backtracking on the cache: use (oops, not there) -> "install" (oops, not there) -> "download"
quick and dirty test is to start deleting and see what is re-created on next run
w
That was my next step 🙂 Nuking
downloads
and
pip
at the end of the CI run, which dropped it to ~5.6GB - low enough to persist to the GHA cache 🎉 Nothing obvious has broken, but this isn't exercising all the paths.
👍 1
I understand the structure a little better now though, and it's clearer what is pants vs pex vs pip
Looks like this is causing a rebuild of some pex's during each run, so something is relying on those paths
w
Looks like this is causing a rebuild of some pex's during each run, so something is relying on those paths
If you have example goals or code or repos that would cause it, that would be useful - because a rebuild might be a bug 🤷
w
It's non trivial to reduce this to a simple test case right now, but i'll see if I can pull one together later