I would like to share something really strange on ...
# development
h
I would like to share something really strange on my machine regarding the local cache storage of
pants
. I admit I have a rather large project, and I’m currently using
pants
for running python tests. After really not much runs - it seems that the
pants
cache becomes rather large, and I’m not sure it’s size wise but more quantity wise. I tried to
rm -rf
the directory, that took so much time that I gave up eventually, and used
rsync
which took lots of time (tens of minutes). I decided to run a simple
find
on the directory and time it - is this expected?
Copy code
[devenv2] ~ ❯❯❯ time find ~/.cache/pants -type f | wc -l
 7333743
noglob find ~/.cache/pants -type f  18.45s user 609.89s system 18% cpu 56:23.34 total
wc -l  1.26s user 1.18s system 0% cpu 56:23.34 total
[devenv2] ~ ❯❯❯
It took a little bit more than 56 minutes to complete I’m using a MacBook Pro - Intel i5 Quad Core with 16GB Ram on an APPLE SSD running macOS Monterey
c
Interestingly enough, I too was looking into my pants cache today.. I have a rather small repo (plus the pants development done) and had a cache of at least 6G (no idea how many files) Took a while to nuke that, minutes though, less than ten minutes…
h
¯\_(ツ)_/¯
w
the pex cache under
$HOME/.cache/pants/named_caches
contains the largest number of files, although it may not be the largest total filesize
it is used for a variety of things, but in particular, it caches all of the requirements that are used by all targets
the “lmdb store” at
$HOME/.cache/pants/lmdb_store
will also be large, but should always contain a bounded number of files, since it’s a database. it is garbage collected by
pantsd
see https://github.com/pantsbuild/pants/issues/11167 about adding a goal for manually garbage collecting those caches… currently nothing collects the PEX cache, although the LMDB store should be kept to a bounded size (see above)
h
@helpful-jackal-12093 and I did some digging, and it looks like the biggest problem is the venvs in pex cache at ~/.cache/pants/named_caches/pex_root/venvs. You have a lot of requirements, and some of them are huge and contain many files. And you have a lot of closely overlapping but not quite identical venvs
I think we want to look at restoring the old
nondeployables
mode of
[python].resolve_all_constraints
, so you can run tests in a single global consistent venv (if you have one)
This would likely be a smart performance/hermeticity tradeoff here
w
We measured the sandbox setup time and it was <250ms... did you observe otherwise?
h
No, we have no evidence that sandbox setup time per se is the issue. But it seems that something to do with #of files, and possibly file size, is. So I'm not sure we're measuring the right thing (e.g., how does filesystem contention factor in to those measurements?) In practice, the wall time for various operations is bad.
w
getting traces (using at least https://pantsbuild.slack.com/archives/C18RRR4JK/p1636606037010500) of representative examples of warm runs 1) without edits, 2) with edits would be helpful. also, understanding exactly which cases you’re trying to optimize… have you moved on from optimizing warm runs of a single test to optimizing “run all of the tests”? if so, a trace of that would be helpful as well.