I would like to share something really strange on my machine Pants #development

I would like to share something really strange on ...

helpful-jackal-12093

11/11/2021, 3:40 PM

I would like to share something really strange on my machine regarding the local cache storage of

pants

. I admit I have a rather large project, and I’m currently using

pants

for running python tests. After really not much runs - it seems that the

pants

cache becomes rather large, and I’m not sure it’s size wise but more quantity wise. I tried to

rm -rf

the directory, that took so much time that I gave up eventually, and used

rsync

which took lots of time (tens of minutes). I decided to run a simple

find

on the directory and time it - is this expected?

Copy code

[devenv2] ~ ❯❯❯ time find ~/.cache/pants -type f | wc -l
 7333743
noglob find ~/.cache/pants -type f  18.45s user 609.89s system 18% cpu 56:23.34 total
wc -l  1.26s user 1.18s system 0% cpu 56:23.34 total
[devenv2] ~ ❯❯❯

It took a little bit more than 56 minutes to complete I’m using a MacBook Pro - Intel i5 Quad Core with 16GB Ram on an APPLE SSD running macOS Monterey

curved-television-6568

11/11/2021, 3:44 PM

Interestingly enough, I too was looking into my pants cache today.. I have a rather small repo (plus the pants development done) and had a cache of at least 6G (no idea how many files) Took a while to nuke that, minutes though, less than ten minutes…

helpful-jackal-12093

11/11/2021, 3:47 PM

¯\_(ツ)_/¯

witty-crayon-22786

11/11/2021, 7:28 PM

the pex cache under

$HOME/.cache/pants/named_caches

contains the largest number of files, although it may not be the largest total filesize

witty-crayon-22786

11/11/2021, 7:29 PM

it is used for a variety of things, but in particular, it caches all of the requirements that are used by all targets

witty-crayon-22786

11/11/2021, 7:29 PM

the “lmdb store” at

$HOME/.cache/pants/lmdb_store

will also be large, but should always contain a bounded number of files, since it’s a database. it is garbage collected by

pantsd

witty-crayon-22786

11/11/2021, 7:30 PM

see https://github.com/pantsbuild/pants/issues/11167 about adding a goal for manually garbage collecting those caches… currently nothing collects the PEX cache, although the LMDB store should be kept to a bounded size (see above)

happy-kitchen-89482

11/12/2021, 1:54 AM

@helpful-jackal-12093 and I did some digging, and it looks like the biggest problem is the venvs in pex cache at ~/.cache/pants/named_caches/pex_root/venvs. You have a lot of requirements, and some of them are huge and contain many files. And you have a lot of closely overlapping but not quite identical venvs

happy-kitchen-89482

11/12/2021, 1:56 AM

I think we want to look at restoring the old

nondeployables

mode of

[python].resolve_all_constraints

, so you can run tests in a single global consistent venv (if you have one)

happy-kitchen-89482

11/12/2021, 1:56 AM

This would likely be a smart performance/hermeticity tradeoff here

witty-crayon-22786

11/12/2021, 2:49 AM

We measured the sandbox setup time and it was <250ms... did you observe otherwise?

happy-kitchen-89482

11/12/2021, 5:34 AM

No, we have no evidence that sandbox setup time per se is the issue. But it seems that something to do with #of files, and possibly file size, is. So I'm not sure we're measuring the right thing (e.g., how does filesystem contention factor in to those measurements?) In practice, the wall time for various operations is bad.

witty-crayon-22786

11/12/2021, 5:58 PM

getting traces (using at least https://pantsbuild.slack.com/archives/C18RRR4JK/p1636606037010500) of representative examples of warm runs 1) without edits, 2) with edits would be helpful. also, understanding exactly which cases you’re trying to optimize… have you moved on from optimizing warm runs of a single test to optimizing “run all of the tests”? if so, a trace of that would be helpful as well.

Open in Slack

Previous Next