How are teams handling caching of `named_caches` i...
# general
c
How are teams handling caching of
named_caches
in CI? We added it to the cache but on BitBucket, the cache is only written the first time it is missed. That means if it's written on a CI run that doesn't touch most of the repo it will be fairly incomplete. We used to be on Gitlab where the cache could be updated after each run so it would continuously expand - but then we had the problem that it would eventually grow too big and the cost of downloading it exceeded the benefit. Is anyone implementing their own cache system using something like S3?
w
We added it to the cache but on BitBucket, the cache is only written the first time it is missed.
are you able to choose the cache key? if so, would suggest choosing it based on a hash of any requirements files you have
c
BitBucket doesn't support dynamic cache keys unfortunately. The other problem is it has 1gb max cache size and our named_caches is 1.5gb.
w
the named caches effectively only contain PIP/PEX artifacts
mm. yea. so it’s entirely possible that it is faster for you not to cache that directory then, and then only re-fetch it if someone misses the “native” cache
c
but we are spending 7minutes "resolving contraints.txt"
w
in every build, or only in builds where you’ve changed requirements?
c
seems to be every build
w
hm, interesting. will follow up with you elsewhere
👍 1
to close the loop here: the
named_caches
are only relevant for Python resolves when you miss Pants’ process cache (stored locally in
~/.cache/pants/lmdb_store
or remotely). you should only miss the process cache if you’ve changed the input requirements, or if they have been garbage collected / aren’t large enough. if you are comfortable re-running resolves from scratch when you change your requirements, using/saving only the process cache in CI environments might work well for you. there is upcoming work to make the size of the process cache much easier to manage.
i’ll likely add some of this to https://www.pantsbuild.org/docs/using-pants-in-ci, but if anyone has any questions/feedback before i do, would love to hear them!
h
Also for future readers - we solved the growth-without-bound problem of CI caches by purging them when they get too large, but that is obviously clunky.
p
but we are spending 7minutes "resolving contraints.txt" (edited)
is resolution here using pip? The most recent versions of pip take far long to resolve deps. https://github.com/pypa/pip/issues/9187
c
I believe so. There is a separate issue where the output we get from Pants is a bit confusing - once that's resolved we'll have a clearer picture of exactly where the time is spent
w
yes, it’s pip underneath.
to get output from PEX and pip, can set
--pex-verbosity=1
(https://www.pantsbuild.org/docs/reference-pex#section-verbosity). but i think that in @clean-city-64472’s case the resolve is large enough that it isn’t worth digging into