I’m curious if anyone here uses hosted Github Acti...
# random
b
I’m curious if anyone here uses hosted Github Actions runners with Pants; the Github-provided ones are okay but I’ve been looking at alternatives like https://namespace.so/ and https://www.warpbuild.com/ to see if I can use those to speed up our CI/CD times. In particular, I’m thinking that more robust disk-based caching between runs could help make things faster.
h
My unit tests take 30 minutes on an 8 core github runner (when changes to core happen in a monorepo). Would love to know if theirs a faster (and cheaper) alternative. Probably the cheapest is self hosting on AWS, but that's complex as well 😅
b
Yeah, self hosting out of the question for me - we don’t have that kind of time / resources to go through and set everything up
💯 1
w
Do you know where you're losing the most time? Is it on pants warmup? Dep downloading? Test-time?
I use GHA, almost exclusively, and have never noticed a particular performance hit
b
Typically it’s on initializing Pants (this takes ~30-40s), and then test time is about 2-3 minutes
w
Whoa
b
We also have remote caching set up though so that saves quite a bit of time on subsequent runs
h
I have 400 packages lol so most of my time is just env stuff (even with a remote env cache)
w
Lol, yep, that makes sense. @better-van-82973 Is scie-pants itself downloaded/cached/opened?
As in, you're not getting hit by unpacking pants, are you?
b
Not sure about that, I just use the
init-pants
Github Action: https://github.com/pantsbuild/actions/tree/main/init-pants
w
I was about to ask 🙂
I wasn't sure how supported that is (for external projects), at the moment (I haven't looked at it recently)
👀 1
b
What I do think is really helpful with these alternative providers is that you can persistently cache specific directories if you want - for CI on Linux machines it looks like caching this directory would be extremely valuable: https://github.com/pantsbuild/actions/blob/main/init-pants/action.yaml#L121C11-L121C55
w
Yeah, that's what I was alluding to - I thought that was cached already with our GHA, but I haven't chcked
b
Looks like it is, from a couple lines after what I linked
w
Then is the 30-40s just the daemon bootup time?
b
About 6-7s is downloading + restoring the cache
But the rest I think is just the daemon booting up - that’s why I wanted to try using a different type of cache/runner service that offers storage on EBS/NVMe type volumes where the read time could be much faster
The actual test running seems to be markedly faster on Namespace (the first service I tried) versus vanilla GHA runners
w
Could it just be CPU bound then? GHA runners are notoriously slow 🙂
Aside: This will hopefully help in the mid-future https://github.com/pantsbuild/pants/issues/19730
b
Could it just be CPU bound then?
Maybe! Notably, my runner on GHA had 16 cores and my runner on Namespace has 8, but Namespace is still faster somehow
w
You know.... I've never checked... Is the daemon boot process multi-core?
g
A few thoughts... We switched from temporary CI runners to persistent both on our Pants and non-pants Python repos and it did wonders for our speed, especially for "initial run" work like pyenv. We run buildkite on GKE though, not GHA. Our CI containers have a fairly large SSD-backed cache dir that lives for ~1 week. I'd try disabling GHA caches (if you use them) and see how your ci performs as well, I find the GHA cache to be far too slow for most uses. Similarly I've not really seen great results with remote caching -- even with a cluster local cache I find it barely helps... Network just adds immense cost on top of everything else. It likely depends a lot on what kind of work you have though... I do ml and rust, both being fairly chunky (our biggest docker image we build is ~10G). My go work is much easier, but also the gain from a cache is much diminished due to small sizes and fast builds. 🤷 Re: diffs between your two hosts. Is one vCPU and the other one real cores? Dedicated Vs shared? Disk speed diffs maybe? Better network?
c
We started with hosted GHA runner (and AZDO before that) and found that we basically spent as much time restoring from cache as we saved by having the cache. I've heard that the implementation of GHA (and AZDO) cache is, uh,

bad

. We went to selfhosted because we have a lot of infra and after we figured out all the silliness around the runners it's pretty sweet. Depending on how much prep you have to do for your workloads, it might be worth it to run your actions in a docker container with things initialised. For example, we were building a bunch of C/C++ stuff (not with Pants) and we could install the toolchain, fetch system dependencies, and maybe even precompile some libraries (idr it was a while ago) in a base docker image. Then pulling that image was comparatively quick.
b
Thanks, that's pretty helpful and part of the service that some of these providers claim to offer (rolling setup steps into a base image + caching). We find that remote caching shaves at least 45 seconds to a minute off our test run times so we'll probably keep that - it's mostly just Pytest so I'm guessing that is a big part of why caching is good.
Following up here - I was able to shave about ~15s off Pants startup time (now down to ~10-15s) by switching the cache to the new provider instead of the GHA cache, so that definitely made a difference. Interestingly, I also tried Tom’s suggestion of disabling remote caching altogether, and this seemed to have little to no impact on the overall runtime of the CI. Most recent run in the screenshot is without remote caching, first two runs are with it enabled
g
I'm super interested in this thread. I enabled remote caching (bazel-remote) and when I calculated the total size of objects uploaded to S3 bucket it's really small (1.4MB) and locally on the bazl-remote host is only ~600MB and the PEX artifacts I'm building are much larger ~500MB to ~3GB. Is it possible cache is not working properly or I have it misconfigured?
Copy code
[GLOBAL]
pants_version = "2.19.0"
remote_cache_read = true
remote_cache_write = true
remote_store_address = "grpc://<ip-address-of-bazel-remote>:9092"
remote_instance_name = "main"