I submitted a bug <https://github.com/pantsbuild/p...
# general
e
I submitted a bug https://github.com/pantsbuild/pants/issues/21764 for the CI caching issue I've been seeing at my org. I suspect it might be an easy fix (fingers crossed, anyways) for someone with some experience with the github actions cache code
I'd be interested in trying to work on this, but I don't really know much about rust, so I'd need a lot of guidance
t
Is the remote cache an EC2 instance? Never mind. I just realized it was GH Actions cache.
Also, what's your
pants.toml
config look like?
e
Copy code
[GLOBAL]
colors = true
remote_provider = "experimental-github-actions-cache"
remote_cache_read = true
remote_cache_write = true
remote_instance_name = "pants_ci_cache"

[stats]
log = true
pants.ci.toml
pants.toml
has a bunch of tool config stuff, but nothing that should affect caching. (version 2.22)
t
What type of cache action are you using to provide S3 backing?
e
Followed the setup at https://www.pantsbuild.org/prerelease/docs/using-pants/remote-caching-and-execution/remote-caching#github-actions-cache So no actions, just getting the URL/token from github itself. Its a github enterprise server managed by my org. Don't know tons about how they've configured it, but as I see it, they had to hook up the Github to some sort of storaget backend and they chose an S3 bucket, When I get
process.env.ACTIONS_CACHE_URL
from github, the url is
<https://github.software.gevernova.com/_services/pipelines/ocrHE8ZScyxq8pTiT1IyKRHNwdDtYL7HOvJRB2oXuk7JCGeFXS/>
and pants logs errors that mention AWS things, so I'm assuming that URL is more or less a proxy to an S3 bucket
t
It's difficult to know if it's a credential propagation issue within GH Actions, or if it's pants-related. S3 storage for GH is on a per-server basis. If the OIDC-based auth configuration is used, it should provide correct JIT credentials at runtime. If it's using static access keys, they can be rotated within AWS and cause access failures. If you're sure the credentials are correct (e.g. they work for other jobs and you see the writes in S3), then there may be some pants implementation that's not taking AWS credentials into account, but it's hard to tell. Do you get any more information when you run the same
pants
command with
-ldebug
?
e
There wasn't really much more information with
-ldebug
. It showed a lot of the "starting remote cache lookup for <xyz>" and "remote cache lookup completed" before the error messages, but nothing more than that
I am able to see in the GH UI (<repo>/actions/caches) that there are a number of caches being created. (pants_ci_cache is my custom prefix, and the rest is provided by pants)
This seems to match with what I see in the stats output, indicating some successes, but many read/write failures
t
Can you provide the output from your
stats
from a run?
I'm curious if you're hitting GHA rate limits.
e
Copy code
20:30:35.87 [INFO] Counters:
  backtrack_attempts: 0
  docker_execution_errors: 0
  docker_execution_requests: 0
  docker_execution_successes: 0
  local_cache_read_errors: 0
  local_cache_requests: 324
  local_cache_requests_cached: 1
  local_cache_requests_uncached: 323
  local_cache_total_time_saved_ms: 2215
  local_cache_write_errors: 0
  local_execution_requests: 323
  local_process_total_time_run_ms: 1771296
  remote_cache_read_errors: 0
  remote_cache_request_timeouts: 0
  remote_cache_requests: 319
  remote_cache_requests_cached: 0
  remote_cache_requests_uncached: 319
  remote_cache_speculation_local_completed_first: 0
  remote_cache_speculation_remote_completed_first: 0
  remote_cache_total_time_saved_ms: 0
  remote_cache_write_attempts: 318
  remote_cache_write_errors: 316
  remote_cache_write_successes: 2
  remote_execution_errors: 0
  remote_execution_requests: 0
  remote_execution_rpc_errors: 0
  remote_execution_rpc_execute: 0
  remote_execution_rpc_retries: 0
  remote_execution_rpc_wait_execution: 0
  remote_execution_success: 0
  remote_execution_timeouts: 0
  remote_process_total_time_run_ms: 0
  remote_store_exists_attempts: 1677
  remote_store_exists_errors: 316
  remote_store_exists_successes: 1317
  remote_store_missing_digest: 0
  remote_store_read_attempts: 0
  remote_store_read_cached: 0
  remote_store_read_errors: 0
  remote_store_read_uncached: 0
  remote_store_request_timeouts: 0
  remote_store_write_attempts: 640
  remote_store_write_errors: 0
  remote_store_write_successes: 640
Most recent run. I don't think its GHA rate limits. Its an enterprise server, and I've been told rate limits have not been enabled. Not to mention the errors are 400 and 403 errors
t
It's still possible to enable GHA rate limits in GH Enterprise, it's just not enabled by default. You're getting SOME successful cache writes, which leads me to believe the credentials are not the issue.
e
From the sounds of things on my end, the GH admin team does not want to be bothered unless I've got pretty solid evidence they need to fix things. Any idea how I could gather some more information here?
Also, it is worth noting, that despite having successful cache writes, I don't get a single test that is skipped due to reading the cache. The
test succeeded in 5.1 seconds
output never mentions the cache in my CI
@broad-processor-92400 noticed that openDAL has addressed this issue and that it will likely be fixed by upgrading the pants' version of openDAL. I have a PR up for this now
👍 3
h
Thanks! Link to PR?
b
t
Now that I got my personal Gitea running with Gitea Actions (modeled after GHA), I'll be running some tests this weekend with cache and pants. It'll be great to share a cache with runners to reduce pants bootstrap time during runs.