We're running Pants in our CI. As suggested in the...
# general
p
We're running Pants in our CI. As suggested in the docs we have a cache that includes
.pants/named_caches/
and
.pants/lmdb_store/
. That cache grows a lot over time. On the first run the cache was about 1.5G. After just 5 or so runs it's now almost 3G. Any idea why it'd be growing so fast and we could do to keep it's size reasonable?
h
Can you tell which of the two is the main culprit? And if it's
named_caches
then which subdir of that?
p
Thanks @happy-kitchen-89482. No clear main culprit:
Copy code
$ du -d4 -h . | sort -h
1.4M    ./python/.pants/lmdb_store/processes
3.0M    ./python/.pants/lmdb_store/directories
1.8G    ./python/.pants/lmdb_store
1.8G    ./python/.pants/lmdb_store/files
2.4G    ./python/.pants/named_caches
2.4G    ./python/.pants/named_caches/pex_root
4.1G    ./python
4.1G    ./python/.pants
7.0G    .
Note that its 7GB unzipped!
I did notice several things like this:
Copy code
python/.pants/named_caches/pex_root/venvs/fe895c4c98ab164a917fe18823712e6b96a3199b/3686ea6f4400e8191c272028cefcc741b5083112/bin/python3.7 -> /home/companion/.pyenv/versions/3.7.10/bin/python3.7
and
Copy code
python/.pants/named_caches/pex_root/venvs/b65710dfb9bf00d8584a70394e5fae7f14b4e260/3686ea6f4400e8191c272028cefcc741b5083112/bin/python3.7 -> /home/companion/.pyenv/versions/3.7.10/bin/python3.7
in the same cache. For context we have multiple CI runners (GitLab CI with auto-scaling) and they use a shared cache stored on S3. Wondering if maybe each run looks like a new python interpreter so the cache essentially get copied?
Having the cache does certainly speed up the build, at least initially, but after several days/weeks it starts to have the opposite effect as it gets so big it takes forever to download and unzip.
h
Yeah, CI caches do need to be periodically pruned/nuked.
So most of ./python/.pants/named_caches/pex_root is hard links, it's not actually taking up that much space on disk originally
but now I'm wondering what happens when you save and restore the cache
my guess is many identical files are now taking up space individually, because hard links aren't restored as such
p
They actually appear to be stored as symlinks:
Copy code
$ find  python/.pants/named_caches/pex_root/ -type l
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/3bef97400693c93b0723e1b308ba06788b8f713f/pex
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/3bef97400693c93b0723e1b308ba06788b8f713f/lib64
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/3bef97400693c93b0723e1b308ba06788b8f713f/bin/python3
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/3bef97400693c93b0723e1b308ba06788b8f713f/bin/python
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/3bef97400693c93b0723e1b308ba06788b8f713f/bin/python3.7
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/21f5571fd67eb969d6e9fa1ddc0ef8982ff99736/pex
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/21f5571fd67eb969d6e9fa1ddc0ef8982ff99736/lib64
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/21f5571fd67eb969d6e9fa1ddc0ef8982ff99736/bin/python3
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/21f5571fd67eb969d6e9fa1ddc0ef8982ff99736/bin/python
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/21f5571fd67eb969d6e9fa1ddc0ef8982ff99736/bin/python3.7
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/20f93770d8fb0cdfa22dcbd08d3fb80b588ff9fb/pex
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/20f93770d8fb0cdfa22dcbd08d3fb80b588ff9fb/lib64
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/20f93770d8fb0cdfa22dcbd08d3fb80b588ff9fb/bin/python3
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/20f93770d8fb0cdfa22dcbd08d3fb80b588ff9fb/bin/python
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/20f93770d8fb0cdfa22dcbd08d3fb80b588ff9fb/bin/python3.7
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/44bfc60ac6020ae3ea7426c49e972d6636ea7568/pex
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/44bfc60ac6020ae3ea7426c49e972d6636ea7568/lib64
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/44bfc60ac6020ae3ea7426c49e972d6636ea7568/bin/python3
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/44bfc60ac6020ae3ea7426c49e972d6636ea7568/bin/python
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/44bfc60ac6020ae3ea7426c49e972d6636ea7568/bin/python3.7
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/7da0dd6174df8800847839986e727fa8c322801e/pex
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/7da0dd6174df8800847839986e727fa8c322801e/lib64
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/7da0dd6174df8800847839986e727fa8c322801e/bin/python3
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/7da0dd6174df8800847839986e727fa8c322801e/bin/python
python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2/7da0dd6174df8800847839986e727fa8c322801e/bin/python3.7

.... a TON more ...
Copy code
$ du -sh python/.pants/named_caches/pex_root/
2.4G    python/.pants/named_caches/pex_root/
hmm... we'll there's something big in there...
here's the siginficant bits of `$ du -d2 -h python/.pants/named_caches/pex_root/ | sort -h`:
Copy code
25M     python/.pants/named_caches/pex_root/venvs/063a62e6b6b144f1d1623f042472e4fbb2c3c392
26M     python/.pants/named_caches/pex_root/http/9
30M     python/.pants/named_caches/pex_root/http/6
33M     python/.pants/named_caches/pex_root/http/3
35M     python/.pants/named_caches/pex_root/http/a
35M     python/.pants/named_caches/pex_root/http/b
38M     python/.pants/named_caches/pex_root/installed_wheels/d52850263b0b05444fdad9a9786ceb78e2b0b905
41M     python/.pants/named_caches/pex_root/http/e
41M     python/.pants/named_caches/pex_root/installed_wheels/380593c3dc6fccbda7a98a24750c4841ab28ec2f
50M     python/.pants/named_caches/pex_root/http/c
53M     python/.pants/named_caches/pex_root/installed_wheels/0ce0fd1fc4d9f23148124a2268520ebdd86c8d4c
57M     python/.pants/named_caches/pex_root/installed_wheels/bdcee86135ec37b008fee8f78f9791c92c74d94e
58M     python/.pants/named_caches/pex_root/http/8
65M     python/.pants/named_caches/pex_root/http/7
69M     python/.pants/named_caches/pex_root/http/2
250M    python/.pants/named_caches/pex_root/installed_wheels/4e44548fe51f12cb20d78b8a0e081bb3d7c9280f
554M    python/.pants/named_caches/pex_root/installed_wheels
556M    python/.pants/named_caches/pex_root/http
1.2G    python/.pants/named_caches/pex_root/venvs/74a865f6390424708fcc9271ef1615c582c4ece2
1.3G    python/.pants/named_caches/pex_root/venvs
2.4G    python/.pants/named_caches/pex_root/
everything else is smaller than 25M. nothing really stands out to me.
Is there a pants command we can run as part of the build to clean out the cruft from the cache? Otherwise we have to click the "clear cache" button in the CI every so often (or do some API integration).
h
BTW
du
will report duplicate storage sizes for multiple hard links pointing to the same inode
One thing we've done is run, in CI, script that checks the size of those dirs and nukes them if they get too big
This is outside of Pants
I vaguely recall someone working on a Pants goal to clean up stuff like this. @bitter-ability-32190 was that you?
Or am I hallucinating it?
b
Lol that was me, I backburnered it because the idea going-forward would involve subgoal rules, and since I don't think they exist didn't want to open that up yet
Notable
./pants caches <subgoal>
like
report
and
clean
👍 1
h
Probably we could start with
caches-report
and
caches-clean
(or just
clean
)?
c
Created ticket where we can bikeshed some ideas for cli subgoals.. https://github.com/pantsbuild/pants/issues/13694
🙌 1
b
In the meantime, we could just add code to support cleaning and reporting, hiding it under
caches
(maybe marked experimental) and it just be non-optimal behavior
👍 1
p
Thanks for the help!
h
Oh @plain-carpet-73994 check out a small script in the info tooltip to nuke the cache: https://www.pantsbuild.org/docs/using-pants-in-ci#directories-to-cache
p
nice! Thanks @hundreds-father-404.
❤️ 1