How are teams handling caching of `named caches` in CI We ad Pants #general

How are teams handling caching of `named_caches` i...

clean-city-64472

07/15/2021, 4:04 PM

How are teams handling caching of

named_caches

in CI? We added it to the cache but on BitBucket, the cache is only written the first time it is missed. That means if it's written on a CI run that doesn't touch most of the repo it will be fairly incomplete. We used to be on Gitlab where the cache could be updated after each run so it would continuously expand - but then we had the problem that it would eventually grow too big and the cost of downloading it exceeded the benefit. Is anyone implementing their own cache system using something like S3?

witty-crayon-22786

07/15/2021, 4:17 PM

We added it to the cache but on BitBucket, the cache is only written the first time it is missed.

are you able to choose the cache key? if so, would suggest choosing it based on a hash of any requirements files you have

clean-city-64472

07/15/2021, 4:18 PM

BitBucket doesn't support dynamic cache keys unfortunately. The other problem is it has 1gb max cache size and our named_caches is 1.5gb.

witty-crayon-22786

07/15/2021, 4:18 PM

the named caches effectively only contain PIP/PEX artifacts

witty-crayon-22786

07/15/2021, 4:19 PM

mm. yea. so it’s entirely possible that it is faster for you not to cache that directory then, and then only re-fetch it if someone misses the “native” cache

clean-city-64472

07/15/2021, 4:20 PM

but we are spending 7minutes "resolving contraints.txt"

witty-crayon-22786

07/15/2021, 4:20 PM

in every build, or only in builds where you’ve changed requirements?

clean-city-64472

07/15/2021, 4:21 PM

seems to be every build

witty-crayon-22786

07/15/2021, 4:21 PM

hm, interesting. will follow up with you elsewhere

👍 1

witty-crayon-22786

07/15/2021, 5:42 PM

to close the loop here: the

named_caches

are only relevant for Python resolves when you miss Pants’ process cache (stored locally in

~/.cache/pants/lmdb_store

or remotely). you should only miss the process cache if you’ve changed the input requirements, or if they have been garbage collected / aren’t large enough. if you are comfortable re-running resolves from scratch when you change your requirements, using/saving only the process cache in CI environments might work well for you. there is upcoming work to make the size of the process cache much easier to manage.

witty-crayon-22786

07/15/2021, 5:45 PM

i’ll likely add some of this to https://www.pantsbuild.org/docs/using-pants-in-ci, but if anyone has any questions/feedback before i do, would love to hear them!

happy-kitchen-89482

07/15/2021, 8:01 PM

Also for future readers - we solved the growth-without-bound problem of CI caches by purging them when they get too large, but that is obviously clunky.

powerful-boots-1234

07/15/2021, 8:07 PM

but we are spending 7minutes "resolving contraints.txt" (edited)

is resolution here using pip? The most recent versions of pip take far long to resolve deps. https://github.com/pypa/pip/issues/9187

clean-city-64472

07/15/2021, 8:13 PM

I believe so. There is a separate issue where the output we get from Pants is a bit confusing - once that's resolved we'll have a clearer picture of exactly where the time is spent

witty-crayon-22786

07/15/2021, 9:17 PM

yes, it’s pip underneath.

witty-crayon-22786

07/15/2021, 9:20 PM

to get output from PEX and pip, can set

--pex-verbosity=1

(https://www.pantsbuild.org/docs/reference-pex#section-verbosity). but i think that in @clean-city-64472’s case the resolve is large enough that it isn’t worth digging into

2 Views

Open in Slack

Previous Next