Hi Pants! I've got a printout of stats at the end ...
# general
e
Hi Pants! I've got a printout of stats at the end of my CI runs, like:
Copy code
13:44:11.47 [INFO] Counters:
  backtrack_attempts: 0
  docker_execution_errors: 0
  docker_execution_requests: 0
  docker_execution_successes: 0
  local_cache_read_errors: 0
  local_cache_requests: 322
  local_cache_requests_cached: 1
  local_cache_requests_uncached: 321
  local_cache_total_time_saved_ms: 1528
  local_cache_write_errors: 0
  local_execution_requests: 321
  local_process_total_time_run_ms: 2086451
  remote_cache_read_errors: 0
  remote_cache_request_timeouts: 0
  remote_cache_requests: 317
  remote_cache_requests_cached: 0
  remote_cache_requests_uncached: 317
  remote_cache_speculation_local_completed_first: 0
  remote_cache_speculation_remote_completed_first: 0
  remote_cache_total_time_saved_ms: 0
  remote_cache_write_attempts: 313
  remote_cache_write_errors: 311
  remote_cache_write_successes: 2
  remote_execution_errors: 0
  remote_execution_requests: 0
  remote_execution_rpc_errors: 0
  remote_execution_rpc_execute: 0
  remote_execution_rpc_retries: 0
  remote_execution_rpc_wait_execution: 0
  remote_execution_success: 0
  remote_execution_timeouts: 0
  remote_process_total_time_run_ms: 0
  remote_store_exists_attempts: 1625
  remote_store_exists_errors: 315
  remote_store_exists_successes: 1259
  remote_store_missing_digest: 0
  remote_store_read_attempts: 0
  remote_store_read_cached: 0
  remote_store_read_errors: 0
  remote_store_read_uncached: 0
  remote_store_request_timeouts: 0
  remote_store_write_attempts: 630
  remote_store_write_errors: 0
  remote_store_write_successes: 630
Is there a guide somewhere as to how to interpret this? In particular, I'm curious about the difference between
remote_store_*
and
remote_cache_*
, but would appreciate any docs that have details on everything
h
Hmm, not sure we document this anywhere, but the source may be informative
“remote store” is cached data, “remote cache” is cached computation
👍 1
That is, files are in the store, process results are in the cache
“speculation” is where we concurrently query the cache and run the process, and if cache returns first we cancel the process
So for example, in your metrics above, one of concern is
remote_cache_write_errors
- you had a lot of them
p
If you are using GitHub Actions Cache (documented here https://www.pantsbuild.org/stable/docs/using-pants/remote-caching-and-execution/remote-caching#github-actions-cache) as your remote cache, then remote_cache_*_errors will be very common because of GitHub's aggressive rate limit (they designed it for a few large requests instead of many tiny requests). If you are using GitHub Actions Cache, I recommend setting this to silence all of those rate limit errors (the stats will still count the errors):
Copy code
[GLOBAL]
# <https://www.pantsbuild.org/stable/reference/global-options#ignore_warnings>
ignore_warnings = [
    # remote cache errors caused by GitHub rate-limits are not helpful
    "Failed to read from remote cache",
    "Failed to write to remote cache",
]
e
Thanks for the information. I am trying to debug the write errors right now. We are using Github Actions Cache (This is my first go at implementing it, not a regression). As far as I understand it, the errors are not due to rate limits. We are using an enterprise server, not regular github.com, and I have been told that no rate limiting is enabled. Running with
-ldebug
, I see two types of error messages in the logs: On cache read:
Copy code
2024-12-12T11:56:29.1458705Z 11:56:29.14 [32m[DEBUG][0m Completed: Remote cache lookup for: Parse Dockerfile.
2024-12-12T11:56:29.1478300Z 11:56:29.14 [33m[WARN][0m Failed to read from remote cache (1 occurrences so far): failed to read pants_ci_cache/action-cache/ca/4b/ca4be85b2bd62dd89d13365599166268394a755205426cd5ae6a5a4a16a0a9a7: Unexpected (persistent) at read, context: { uri: <https://ghe-actions-prod-qhqyjglk.s3.amazonaws.com/actions-69c8c8939b70/9bb02394953d4d45a28a6ccad6554933/2287473ef36b141086d40095c0eef846?AWSAccessKeyId=AKIAQ3EGVTWOBBE5CS57&Expires=1734008189&Signature=%2FuisMdyUYmmJAZnaIY1wYipX3aA%3D>, response: Parts { status: 400, version: HTTP/1.1, headers: {"x-amz-request-id": "HWGMKQ63G92CDVCP", "x-amz-id-2": "uzYJI+u7NwzzoAsb2hx/8fL5FwImgpR5h7OWsKDgJAUAT2v8hrwTg4QrawIysFSStZAF4ZStB/EhNNDmLcI0Sg==", "x-amz-region": "us-east-1", "content-type": "application/xml", "transfer-encoding": "chunked", "date": "Thu, 12 Dec 2024 11:56:28 GMT", "connection": "close", "server": "AmazonS3"} }, service: ghac, path: pants_ci_cache/action-cache/ca/4b/ca4be85b2bd62dd89d13365599166268394a755205426cd5ae6a5a4a16a0a9a7, range: 0- } => <?xml version="1.0" encoding="UTF-8"?>
2024-12-12T11:56:29.1491354Z <Error><Code>InvalidArgument</Code><Message>Requests specifying Server Side Encryption with AWS KMS managed keys require AWS Signature Version 4.</Message><ArgumentName>Authorization</ArgumentName><ArgumentValue>null</ArgumentValue><RequestId>HWGMKQ63G92CDVCP</RequestId><HostId>uzYJI+u7NwzzoAsb2hx/8fL5FwImgpR5h7OWsKDgJAUAT2v8hrwTg4QrawIysFSStZAF4ZStB/EhNNDmLcI0Sg==</HostId></Error>
and on write:
Copy code
2024-12-12T11:56:31.7312720Z 11:56:31.72 [33m[WARN][0m Failed to write to remote cache (1 occurrences so far): failed to query pants_ci_cache/byte-store/39/72/3972dc9744f6499f0f9b2dbf76696f2ae7ad8af9b23dde66d6af86c9dfb36986: PermissionDenied (persistent) at stat, context: { uri: <https://ghe-actions-prod-qhqyjglk.s3.amazonaws.com/actions-69c8c8939b70/9bb02394953d4d45a28a6ccad6554933/fe53473ef36b141086d40095c0eef846?AWSAccessKeyId=AKIAQ3EGVTWOBBE5CS57&Expires=1734008192&Signature=AUEETO5HZ7A4zme0gmdgHKRVaZI%3D>, response: Parts { status: 403, version: HTTP/1.1, headers: {"x-amz-request-id": "CNFQBVK7PCH1Q01N", "x-amz-id-2": "hJmW2N8uzccc3hWjO/6PhkG9WoL2PIgIu/blCKmXQi+hIG8DwNfcvE2MK/aRijU/f3hvGh8yGoM=", "content-type": "application/xml", "transfer-encoding": "chunked", "date": "Thu, 12 Dec 2024 11:56:30 GMT", "server": "AmazonS3"} }, service: ghac, path: pants_ci_cache/byte-store/39/72/3972dc9744f6499f0f9b2dbf76696f2ae7ad8af9b23dde66d6af86c9dfb36986 }
From these, I gathered that the backing storage for our GHA cache is an S3 bucket, and I did read that AWS uses 503 errors for rate limits, so this suggests that the errors are not due to underlying rate limits (ie. somewhere else in the chain than GHA itself). The read errors mention
Unexpected (persistent) at read, ... Requests specifying Server Side Encryption with AWS KMS managed keys require AWS Signature Version 4.
and the write errors mention
PermissionDenied (persistent) at stat
I'm a bit confused by these since they both seem to suggest issues with auth/token/signature, and I wouldn't expect these to be sources of intermittent errors (I had successes and failures for both read and write). I have a meeting today with our team that manages the github instance, so trying to figure out as much info as I can in preparation.