https://pantsbuild.org/ logo
m

melodic-thailand-99227

12/09/2021, 7:59 PM
Is there a plan to provide a maximum cache size? Mine got to 30GB and don’t seem to shrink 😐
e

enough-analyst-54434

12/09/2021, 8:04 PM
Yes. You can follow along here: https://github.com/pantsbuild/pants/issues/11167
w

witty-crayon-22786

12/09/2021, 8:06 PM
~/.cache/pants/lmdb_store
should already be garbage collected by
pantsd
periodically: it’s only
~/.cache/pants/named_caches
that aren’t collected currently
which of those has the bulk of the usage?
e

enough-analyst-54434

12/09/2021, 8:11 PM
I thought we gc'd but did not compact the LMDB?
Or something along those lines.
w

witty-crayon-22786

12/09/2021, 8:12 PM
that’s correct. but it runs often enough that it should cap the total usage at a high water mark
m

melodic-thailand-99227

12/09/2021, 8:14 PM
I have like 20 venvs each a copy of the entire dependecy tree
image.png
e

enough-analyst-54434

12/09/2021, 8:37 PM
The venvs are created with hardlinks. So, for example, on my machine 2.5 actual GB:
Copy code
$ du -sh ~/.cache/pants/named_caches/pex_root/venvs/
2.5G	/home/jsirois/.cache/pants/named_caches/pex_root/venvs/
But apparent 28GB if you count each hard link as new data (-l):
Copy code
$ du -lsh ~/.cache/pants/named_caches/pex_root/venvs/
28G	/home/jsirois/.cache/pants/named_caches/pex_root/venvs/
I'm unfamiliar with ncdu. Do you know what it's counting? Is it hard link aware?
p

polite-garden-50641

12/09/2021, 8:43 PM
e

enough-analyst-54434

12/13/2021, 1:30 AM
I looked into this and found a win for named_caches. Although symlinks and hardlinks provide roughly the same savings (obviously more inodes for hardlinks), The hardlinking was not sharing
.pyc
amongst venvs. Switching to symlinks solves this, and, for example, for toolchain:
Copy code
$ du -sh ~/.cache/pants/lmdb_store.tc.venv_symlinks ~/.cache/pants/named_caches/pex_root.tc.venv_symlinks
179M	/home/jsirois/.cache/pants/lmdb_store.tc.venv_symlinks
1.1G	/home/jsirois/.cache/pants/named_caches/pex_root.tc.venv_symlinks
$ du -sh ~/.cache/pants/lmdb_store.tc.venv_hardlinks ~/.cache/pants/named_caches/pex_root.tc.venv_hardlinks
179M	/home/jsirois/.cache/pants/lmdb_store.tc.venv_hardlinks
3.4G	/home/jsirois/.cache/pants/named_caches/pex_root.tc.venv_hardlinks
I'll be getting out a PR for this shortly.
w

witty-crayon-22786

12/13/2021, 2:49 AM
Very cool... I guess it makes sense with so many fewer inodes, but still not obvious!
Probably a lot faster to create too...
e

enough-analyst-54434

12/13/2021, 3:46 AM
Its ~nothing to do with the fewer inodes. It's to do with .pyc files now being all shared.
The pyc files are only created later wen the venvs are executed against by whatever intepreter. Now all the work of creating a - say Python 3.6 .pyc for module foo is shared as is the space. Before, with hard links, neither the effort nor space were shared. Each venv had its own unique pyc.
Similar to the same silent problem we suffer with loose sources in Pants execution chroots that are only adjoined via PEX_EXTRA_SYS_PATH - we always re-compile these and never share. There there is no real space savings since the chroots are ephemeral, but we waste effort repeatedly. Benjyt had noted this when working getting django repo working with Pants some tie ago. O(1sec) time spent re-compiling pycs for every IT.