Is there a plan to provide a maximum cache size? M...
# development
m
Is there a plan to provide a maximum cache size? Mine got to 30GB and don’t seem to shrink 😐
e
Yes. You can follow along here: https://github.com/pantsbuild/pants/issues/11167
w
~/.cache/pants/lmdb_store
should already be garbage collected by
pantsd
periodically: it’s only
~/.cache/pants/named_caches
that aren’t collected currently
which of those has the bulk of the usage?
e
I thought we gc'd but did not compact the LMDB?
Or something along those lines.
w
that’s correct. but it runs often enough that it should cap the total usage at a high water mark
m
I have like 20 venvs each a copy of the entire dependecy tree
image.png
e
The venvs are created with hardlinks. So, for example, on my machine 2.5 actual GB:
Copy code
$ du -sh ~/.cache/pants/named_caches/pex_root/venvs/
2.5G	/home/jsirois/.cache/pants/named_caches/pex_root/venvs/
But apparent 28GB if you count each hard link as new data (-l):
Copy code
$ du -lsh ~/.cache/pants/named_caches/pex_root/venvs/
28G	/home/jsirois/.cache/pants/named_caches/pex_root/venvs/
I'm unfamiliar with ncdu. Do you know what it's counting? Is it hard link aware?
p
e
I looked into this and found a win for named_caches. Although symlinks and hardlinks provide roughly the same savings (obviously more inodes for hardlinks), The hardlinking was not sharing
.pyc
amongst venvs. Switching to symlinks solves this, and, for example, for toolchain:
Copy code
$ du -sh ~/.cache/pants/lmdb_store.tc.venv_symlinks ~/.cache/pants/named_caches/pex_root.tc.venv_symlinks
179M	/home/jsirois/.cache/pants/lmdb_store.tc.venv_symlinks
1.1G	/home/jsirois/.cache/pants/named_caches/pex_root.tc.venv_symlinks
$ du -sh ~/.cache/pants/lmdb_store.tc.venv_hardlinks ~/.cache/pants/named_caches/pex_root.tc.venv_hardlinks
179M	/home/jsirois/.cache/pants/lmdb_store.tc.venv_hardlinks
3.4G	/home/jsirois/.cache/pants/named_caches/pex_root.tc.venv_hardlinks
I'll be getting out a PR for this shortly.
w
Very cool... I guess it makes sense with so many fewer inodes, but still not obvious!
Probably a lot faster to create too...
e
Its ~nothing to do with the fewer inodes. It's to do with .pyc files now being all shared.
The pyc files are only created later wen the venvs are executed against by whatever intepreter. Now all the work of creating a - say Python 3.6 .pyc for module foo is shared as is the space. Before, with hard links, neither the effort nor space were shared. Each venv had its own unique pyc.
Similar to the same silent problem we suffer with loose sources in Pants execution chroots that are only adjoined via PEX_EXTRA_SYS_PATH - we always re-compile these and never share. There there is no real space savings since the chroots are ephemeral, but we waste effort repeatedly. Benjyt had noted this when working getting django repo working with Pants some tie ago. O(1sec) time spent re-compiling pycs for every IT.