So I'm attemping to set up remote caching for pant...
# general
r
So I'm attemping to set up remote caching for pants with Buildbarn and it works great locally, but I'm seeing a ton of tcp timeouts when running from our self-hosted github runners. I'm pretty far out of my depth attempting to debug this, does anyone have any ideas on what could cause these? I've also proposed this question on the Buildbarn slack and they seem confident that it's a networking config somewhere on my end but I don't even really know where to start trying to figure out the issue.
It seems like pants is firing off a ton of requests to the remote cache at once and it's getting overwhelmed
For example, when running our unit tests
Copy code
remote_cache_read_errors: 328
  remote_cache_request_timeouts: 993
  remote_cache_requests: 345
  remote_cache_requests_cached: 0
  remote_cache_requests_uncached: 0
  remote_cache_speculation_local_completed_first: 17
  remote_cache_speculation_remote_completed_first: 0
  remote_cache_total_time_saved_ms: 0
  remote_cache_write_attempts: 345
  remote_cache_write_errors: 96
  remote_cache_write_successes: 0
b
Ah, if it’s potentially too many requests, does setting
remote_store_rpc_concurrency
to something smaller help? https://www.pantsbuild.org/docs/reference-global
r
I'm also positive it's actually something with our service mesh but that's good to know! Networking is hard 🫠
f
fyi in case it could prove to be helpful:for REAPI: https://github.com/toolchainlabs/remote-api-tools has a synthetic load testing tool for REAPI remote cache. That would get you repeatable sets of requests.
🙌 1
l
I am using Buildbarn too (not on GitHub) and I am setting remote_cache_rpc_timeout_millis to
5000
(default is
1500
). Worth trying that too.