I m curious if anyone here uses hosted Github Actions runner Pants #random

I’m curious if anyone here uses hosted Github Acti...

better-van-82973

02/23/2024, 8:53 PM

I’m curious if anyone here uses hosted Github Actions runners with Pants; the Github-provided ones are okay but I’ve been looking at alternatives like https://namespace.so/ and https://www.warpbuild.com/ to see if I can use those to speed up our CI/CD times. In particular, I’m thinking that more robust disk-based caching between runs could help make things faster.

hallowed-artist-8187

02/23/2024, 9:57 PM

My unit tests take 30 minutes on an 8 core github runner (when changes to core happen in a monorepo). Would love to know if theirs a faster (and cheaper) alternative. Probably the cheapest is self hosting on AWS, but that's complex as well 😅

better-van-82973

02/23/2024, 9:58 PM

Yeah, self hosting out of the question for me - we don’t have that kind of time / resources to go through and set everything up

💯 1

wide-midnight-78598

02/23/2024, 11:31 PM

Do you know where you're losing the most time? Is it on pants warmup? Dep downloading? Test-time?

wide-midnight-78598

02/23/2024, 11:32 PM

I use GHA, almost exclusively, and have never noticed a particular performance hit

better-van-82973

02/23/2024, 11:34 PM

Typically it’s on initializing Pants (this takes ~30-40s), and then test time is about 2-3 minutes

wide-midnight-78598

02/23/2024, 11:34 PM

Whoa

better-van-82973

02/23/2024, 11:34 PM

We also have remote caching set up though so that saves quite a bit of time on subsequent runs

hallowed-artist-8187

02/23/2024, 11:35 PM

I have 400 packages lol so most of my time is just env stuff (even with a remote env cache)

wide-midnight-78598

02/23/2024, 11:35 PM

Lol, yep, that makes sense. @better-van-82973 Is scie-pants itself downloaded/cached/opened?

wide-midnight-78598

02/23/2024, 11:36 PM

As in, you're not getting hit by unpacking pants, are you?

better-van-82973

02/23/2024, 11:36 PM

Not sure about that, I just use the

init-pants

Github Action: https://github.com/pantsbuild/actions/tree/main/init-pants

wide-midnight-78598

02/23/2024, 11:37 PM

I was about to ask 🙂

wide-midnight-78598

02/23/2024, 11:37 PM

I wasn't sure how supported that is (for external projects), at the moment (I haven't looked at it recently)

👀 1

better-van-82973

02/23/2024, 11:39 PM

What I do think is really helpful with these alternative providers is that you can persistently cache specific directories if you want - for CI on Linux machines it looks like caching this directory would be extremely valuable: https://github.com/pantsbuild/actions/blob/main/init-pants/action.yaml#L121C11-L121C55

wide-midnight-78598

02/23/2024, 11:39 PM

Yeah, that's what I was alluding to - I thought that was cached already with our GHA, but I haven't chcked

better-van-82973

02/23/2024, 11:40 PM

Looks like it is, from a couple lines after what I linked

wide-midnight-78598

02/23/2024, 11:40 PM

Then is the 30-40s just the daemon bootup time?

better-van-82973

02/23/2024, 11:41 PM

About 6-7s is downloading + restoring the cache

better-van-82973

02/23/2024, 11:43 PM

But the rest I think is just the daemon booting up - that’s why I wanted to try using a different type of cache/runner service that offers storage on EBS/NVMe type volumes where the read time could be much faster

better-van-82973

02/23/2024, 11:44 PM

The actual test running seems to be markedly faster on Namespace (the first service I tried) versus vanilla GHA runners

wide-midnight-78598

02/23/2024, 11:44 PM

Could it just be CPU bound then? GHA runners are notoriously slow 🙂

wide-midnight-78598

02/23/2024, 11:44 PM

Aside: This will hopefully help in the mid-future https://github.com/pantsbuild/pants/issues/19730

better-van-82973

02/23/2024, 11:45 PM

Could it just be CPU bound then?

Maybe! Notably, my runner on GHA had 16 cores and my runner on Namespace has 8, but Namespace is still faster somehow

wide-midnight-78598

02/23/2024, 11:45 PM

You know.... I've never checked... Is the daemon boot process multi-core?

gorgeous-winter-99296

02/24/2024, 10:06 AM

A few thoughts... We switched from temporary CI runners to persistent both on our Pants and non-pants Python repos and it did wonders for our speed, especially for "initial run" work like pyenv. We run buildkite on GKE though, not GHA. Our CI containers have a fairly large SSD-backed cache dir that lives for ~1 week. I'd try disabling GHA caches (if you use them) and see how your ci performs as well, I find the GHA cache to be far too slow for most uses. Similarly I've not really seen great results with remote caching -- even with a cluster local cache I find it barely helps... Network just adds immense cost on top of everything else. It likely depends a lot on what kind of work you have though... I do ml and rust, both being fairly chunky (our biggest docker image we build is ~10G). My go work is much easier, but also the gain from a cache is much diminished due to small sizes and fast builds. 🤷 Re: diffs between your two hosts. Is one vCPU and the other one real cores? Dedicated Vs shared? Disk speed diffs maybe? Better network?

careful-address-89803

02/24/2024, 9:21 PM

We started with hosted GHA runner (and AZDO before that) and found that we basically spent as much time restoring from cache as we saved by having the cache. I've heard that the implementation of GHA (and AZDO) cache is, uh,

bad▾

. We went to selfhosted because we have a lot of infra and after we figured out all the silliness around the runners it's pretty sweet. Depending on how much prep you have to do for your workloads, it might be worth it to run your actions in a docker container with things initialised. For example, we were building a bunch of C/C++ stuff (not with Pants) and we could install the toolchain, fetch system dependencies, and maybe even precompile some libraries (idr it was a while ago) in a base docker image. Then pulling that image was comparatively quick.

better-van-82973

02/24/2024, 9:38 PM

Thanks, that's pretty helpful and part of the service that some of these providers claim to offer (rolling setup steps into a base image + caching). We find that remote caching shaves at least 45 seconds to a minute off our test run times so we'll probably keep that - it's mostly just Pytest so I'm guessing that is a big part of why caching is good.

better-van-82973

02/26/2024, 2:39 AM

Following up here - I was able to shave about ~15s off Pants startup time (now down to ~10-15s) by switching the cache to the new provider instead of the GHA cache, so that definitely made a difference. Interestingly, I also tried Tom’s suggestion of disabling remote caching altogether, and this seemed to have little to no impact on the overall runtime of the CI. Most recent run in the screenshot is without remote caching, first two runs are with it enabled

gentle-flower-25372

03/05/2024, 8:41 PM

I'm super interested in this thread. I enabled remote caching (bazel-remote) and when I calculated the total size of objects uploaded to S3 bucket it's really small (1.4MB) and locally on the bazl-remote host is only ~600MB and the PEX artifacts I'm building are much larger ~500MB to ~3GB. Is it possible cache is not working properly or I have it misconfigured?

Copy code

[GLOBAL]
pants_version = "2.19.0"
remote_cache_read = true
remote_cache_write = true
remote_store_address = "grpc://<ip-address-of-bazel-remote>:9092"
remote_instance_name = "main"

7 Views

Open in Slack

Previous Next