Hey so I started setting up pants in CI and from reading the Pants #general

Hey, so I started setting up pants in CI, and from...

rapid-crayon-8232

03/23/2022, 5:11 PM

Hey, so I started setting up pants in CI, and from reading the doc https://www.pantsbuild.org/docs/using-pants-in-ci It is recommended to cache the

$HOME/.cache/pants

folder between steps/build for faster execution. The catch is that even when running a

./pants

goals on only changed targets (with

--changed-since=origin/main

), pants starts by downloading all dependencies in the impacted lockfiles, and with a

data-science

resolve that can easily get to ~10Gbs of cache to upload/download. Is there any way around this ? or Am i doing something wrong^^, since it's indeed slower to push/pull 10Gigs of cache each time

rapid-crayon-8232

03/23/2022, 5:11 PM

image.png

witty-crayon-22786

03/23/2022, 7:03 PM

stashing portions of the cache is definitely an option. if you keep only

$HOME/.cache/pants/lmdb_store

(for example), you will be able to hit for exact matches, but you won’t keep a generic PIP cache (under

$HOME/.cache/pants/named_caches

), and will need to re-resolve from scratch after requirements change

witty-crayon-22786

03/23/2022, 7:04 PM

but also: pants supports native remote caching, which uploads/downloads precise artifacts, to avoid the need to grab things from local disk

rapid-crayon-8232

03/23/2022, 7:05 PM

hmm but it will not be particularly helpfull when running on only changed targets, since they have changed and will need the pip cache to run

witty-crayon-22786

03/23/2022, 7:10 PM

@rapid-crayon-8232: not quite: if your requirements have not changed, then the resolve will not have changed, and can stay in the cache

witty-crayon-22786

03/23/2022, 7:11 PM

you can hit the cache for your thirdparty requirements, and then it will be consumed to actually run changed targets

polite-garden-50641

03/23/2022, 7:22 PM

perhaps use the hash of the reqs file as part of the CI system cash key. example (for GH actions, you can use the hashFiles expression): https://docs.github.com/en/actions/learn-github-actions/expressions#hashfiles Other CI systems have other ways to do this.

happy-kitchen-89482

03/23/2022, 9:07 PM

@rapid-crayon-8232 Our company (Toolchain) offers remote caching as a service, if you're interested in trying that out!

rhythmic-battery-45198

03/23/2022, 9:09 PM

I had the same question a few days ago when integrating with GitHub Actions. Noticed the cache action was tarring/uploading ~7G when I naively cached the entire

lmdb_store

and

named_caches

directories. We use self-hosted runners on azure kubernetes. So, ended up mounting a persistent volume in each runner to enable persisting the cache locally between runs. Works well for our specific use case.

high-yak-85899

03/23/2022, 9:18 PM

Are y'all doing the caching action yourself instead of a tool that is compatible with REAPI like the docs describe here.

happy-kitchen-89482

03/23/2022, 9:18 PM

We implement REAPI

happy-kitchen-89482

03/23/2022, 9:18 PM

Since that's what it's for 🙂

high-yak-85899

03/23/2022, 9:19 PM

Just to stress how flexible Pants set up is, we were able to integrate it with an already existing deployment of https://github.com/buchgr/bazel-remote that we had with no changes.

high-yak-85899

03/23/2022, 9:19 PM

Sorry @happy-kitchen-89482, my question was meant for @rhythmic-battery-45198 and others 🙂

happy-kitchen-89482

03/23/2022, 9:19 PM

Oh, hah 🙂

rapid-crayon-8232

03/23/2022, 9:21 PM

@rhythmic-battery-45198 ended up doing the same since our runners are already self hosted and it was pretty easy to setup ^^ @witty-crayon-22786 good to know, dependencies indeed do not change very often so it very doable solution. @happy-kitchen-89482 thanks, we're still evaluating how we want to setup our monorepo right now, but it's definitely an option in the future.

rhythmic-battery-45198

03/24/2022, 1:54 PM

@high-yak-85899 We don't have an explicit cache step as part of the github action workflow. We removed the cache step since the cache directory is persisted on the github runner vm's disk. I intend to look into remote caching when I have more time but went that route as a quick solution.

4 Views

Open in Slack

Previous Next