I have setup buildbarn and I am having some issues...
# general
s
I have setup buildbarn and I am having some issues with caching in remote mode. wondering if anyone could help. Buildbarn itself seems to be working,
The issue I am having is that when running with envirionment = "remote", caching appears not to be working
I see some messages like this in debug mode:
Copy code
execute_request=ExecuteRequest { 
  instance_name: "main", 
  skip_cache_lookup: true,  <-- This is the problem
  action_digest: Some(Digest { hash: "...", size_bytes: 142 }), 
  ...
}
Copy code
remote_cache_warnings = "always"
cache_content_behavior = "validate" #tried multiple values

remote_instance_name = "main"

remote_cache_read = true
remote_cache_write = true
remote_store_address = "<grpc://frontend.buildbarn:8980>"

remote_execution = true
remote_execution_address = "<grpc://frontend.buildbarn:8980>"
Copy code
14:08:48.13 [DEBUG] built REv2 request (build_id=pants_run_2025_07_11_14_08_42_570_6a0d382e5e794ea8bfdfc0a6004292c2): action=Action { command_digest: Some(Digest { hash: "3a023af49581d356f3fbb1713789a6bc5ad6bc7be8ecaaf473b79d291c65a22f", size_bytes: 1315 }), input_root_digest: Some(Digest { hash: "57ceb377ab5dc55043025a5ae5005b976b64fca5fd608cbec4c4d666f99ed22a", size_bytes: 335 }), timeout: None, do_not_cache: false, salt: b"", platform: None }; command=Command { arguments: ["/bin/python3", "./pex", "--tmpdir", ".tmp", "--no-emit-warnings", "--pip-version", "24.2", "--python-path", "/usr/bin:/bin:/usr/local/bin:/opt/homebrew/bin:/home/runner/.pyenv/shims:/usr/bin:/bin", "--output-file", "projects.zeopp/binary-srcs.pex", "--emit-warnings", "--check=warn", "--include-tools", "--requirements-pex", "local_dists.pex", "--interpreter-constraint", "CPython<3.13,>=3.11", "--sources-directory=source_files", "--no-pypi", "--index=<https://pypi.org/simple/>", "--index=xxxxxxxx", "--index=<https://download.pytorch.org/whl/cu124>", "--find-links=<https://data.pyg.org/whl/torch-2.6.0%2Bcu124.html>", "--manylinux", "manylinux2014", "--resolver-version", "pip-2020-resolver", "--layout", "packed"], environment_variables: [EnvironmentVariable { name: "CPPFLAGS", value: "" }, EnvironmentVariable { name: "GOOGLE_APPLICATION_CREDENTIALS", value: "/run/secrets/gcp" }, EnvironmentVariable { name: "LDFLAGS", value: "" }, EnvironmentVariable { name: "PANTS_CACHE_KEY_EXECUTION_STRATEGY", value: "remote_execution" }, EnvironmentVariable { name: "PANTS_CACHE_KEY_TARGET_PLATFORM", value: "linux_x86_64" }, EnvironmentVariable { name: "PATH", value: "/usr/bin:/bin:/usr/local/bin:/opt/homebrew/bin" }, EnvironmentVariable { name: "PEX_IGNORE_RCFILES", value: "true" }, EnvironmentVariable { name: "PEX_PYTHON", value: "/bin/python3" }, EnvironmentVariable { name: "PEX_ROOT", value: ".cache/pex_root" }], output_files: [], output_directories: ["projects.zeopp/binary-srcs.pex"], output_paths: [], platform: Some(Platform { properties: [] }), working_directory: "", output_node_properties: [] }; execute_request=ExecuteRequest { instance_name: "main", skip_cache_lookup: true, action_digest: Some(Digest { hash: "41358c5100f26c7a575cac887a3c6706cb109f33c7395c0a550a683d82c2fbfb", size_bytes: 142 }), execution_policy: None, results_cache_policy: None }
14:08:48.14 [DEBUG] no current operation: submitting execute request: build_id=pants_run_2025_07_11_14_08_42_570_6a0d382e5e794ea8bfdfc0a6004292c2; execute_request=ExecuteRequest { instance_name: "main", skip_cache_lookup: true, action_digest: Some(Digest { hash: "41358c5100f26c7a575cac887a3c6706cb109f33c7395c0a550a683d82c2fbfb", size_bytes: 142 }), execution_policy: None, results_cache_policy: None }
14:08:48.14 [DEBUG] no current operation: submitting execute request: build_id=pants_run_2025_07_11_14_08_42_570_6a0d382e5e794ea8bfdfc0a6004292c2; execute_request=ExecuteRequest { instance_name: "main", skip_cache_lookup: true, action_digest: Some(Digest { hash: "07ad1719ece98fac57414db3f709d9baabe7f6dd6de69dd861b4270020a51cf3", size_bytes: 142 }), execution_policy: None, results_cache_policy: None }
14:08:48.16 [DEBUG] Completed: Remote cache lookup for: Building 4 requirements for projects.zeopp/binary-deps.pex from the third-party/resolves/python-default/pex.lock resolve: aiofiles>=24.1.0, flytekit>=1.15, pydantic-settings>=... (23 characters truncated)
14:08:48.16 [DEBUG] remote cache miss for: "Building 4 requirements for projects.zeopp/binary-deps.pex from the third-party/resolves/python-default/pex.lock resolve: aiofiles>=24.1.0, flytekit>=1.15, pydantic-settings>=2.7.1, pydantic>=2.10.2" digest=Digest { hash: Fingerprint<3f16609d990f9de7d5dab11c3b319d9524c38f984dd6e394094362a3960859b1>, size_bytes: 142 }
14:08:48.16 [DEBUG] Completed: Remote cache lookup for: Building projects.zeopp/binary-srcs.pex
14:08:48.16 [DEBUG] remote cache miss for: "Building projects.zeopp/binary-srcs.pex" digest=Digest { hash: Fingerprint<41358c5100f26c7a575cac887a3c6706cb109f33c7395c0a550a683d82c2fbfb>, size_bytes: 142 }
14:08:50.40 [INFO] Completed: Building projects.zeopp/binary-srcs.pex
Interestingly, even in local runs, the cache seems not to be working, unless I am reading things wrong
Copy code
14:13:42.51 [DEBUG] Running Building 4 requirements for projects.zeopp/binary-deps.pex from the third-party/resolves/python-default/pex.lock resolve: aiofiles>=24.1.0, flytekit>=1.15, pydantic-settings>=2.7.1, pydantic>=2.10.2 under semaphore with concurrency id: 10, and concurrency: 4
14:13:42.52 [DEBUG] Completed: Remote cache lookup for: Building 4 requirements for projects.zeopp/binary-deps.pex from the third-party/resolves/python-default/pex.lock resolve: aiofiles>=24.1.0, flytekit>=1.15, pydantic-settings>=... (23 characters truncated)
14:13:42.52 [DEBUG] remote cache miss for: "Building 4 requirements for projects.zeopp/binary-deps.pex from the third-party/resolves/python-default/pex.lock resolve: aiofiles>=24.1.0, flytekit>=1.15, pydantic-settings>=2.7.1, pydantic>=2.10.2" digest=Digest { hash: Fingerprint<a9ce0f1d862126b8cd21b8d9c79094deb2ea98f64e46082a5472e9b573a5d9a3>, size_bytes: 142 }
14:13:42.52 [DEBUG] Completed: setup_sandbox
f
Pants checks the cache itself by calling the REAPI GetActionResult method before it tries to submit an execution request.
skip_cache_lookup: true
is there to prevent the remote executor from duplicating the already-performed cache lookup.
Add the
--stats-log
option to dump Pants internal counters. You will see remote caching and remote execution counters including for caching the numbers of hits/misses; number of errors, etc.
👀 1
That should hopefully give you a better idea of what's going on since the metrics encompass all caching / execution requests.
s
At the moment I see that when I do a remote execution, it takes 20 seconds vs 6 seconds locally
so trying to diagnose that
f
python code? remote execution cannot use the "named" cache mechanism normally which local execution can use. Every build action needs to have Pex rebuild the venv used for executing the Python code.
✅ 1
remote execution is not a win for Pants / Pex and Python
There is an experimental flag to allow you to expose a writable directory on the remote execution system and have Pants use that for named caches: https://www.pantsbuild.org/stable/reference/global-options#remote_execution_append_only_caches_base_path But it is very much a work around and really would only work if there was a single worker.
✅ 1
s
Copy code
15:31:19.56 [DEBUG] Completed: `package` goal
15:31:19.56 [DEBUG] computed 1 nodes in 24.357563 seconds. there are 23023 total nodes.
15:31:19.56 [DEBUG] waiting for 3 session end task(s) to complete
15:31:19.56 [DEBUG] all session end tasks completed successfully
15:31:19.56 [DEBUG] Async completion is disabled: waiting for workunit callbacks to complete...
15:31:19.96 [INFO] Counters:
  backtrack_attempts: 0
  docker_execution_errors: 0
  docker_execution_requests: 0
  docker_execution_successes: 0
  local_cache_read_errors: 0
  local_cache_requests: 4
  local_cache_requests_cached: 0
  local_cache_requests_uncached: 4
  local_cache_total_time_saved_ms: 0
  local_cache_write_errors: 0
  local_execution_requests: 0
  local_process_total_time_run_ms: 0
  remote_cache_read_errors: 0
  remote_cache_request_timeouts: 0
  remote_cache_requests: 3
  remote_cache_requests_cached: 0
  remote_cache_requests_uncached: 3
  remote_cache_speculation_local_completed_first: 0
  remote_cache_speculation_remote_completed_first: 0
  remote_cache_total_time_saved_ms: 0
  remote_cache_write_attempts: 3
  remote_cache_write_errors: 0
  remote_cache_write_successes: 3
  remote_execution_errors: 0
  remote_execution_requests: 4
  remote_execution_rpc_errors: 0
  remote_execution_rpc_execute: 4
  remote_execution_rpc_retries: 0
  remote_execution_rpc_wait_execution: 0
  remote_execution_success: 4
  remote_execution_timeouts: 0
  remote_process_total_time_run_ms: 28356
  remote_store_exists_attempts: 7
  remote_store_exists_errors: 0
  remote_store_exists_successes: 7
  remote_store_missing_digest: 0
  remote_store_read_attempts: 3
  remote_store_read_cached: 3
  remote_store_read_errors: 0
  remote_store_read_uncached: 0
  remote_store_request_timeouts: 0
  remote_store_write_attempts: 14
  remote_store_write_errors: 0
  remote_store_write_successes: 14
Observation histogram summaries:
Summary of `local_store_read_blob_time_micros` observation histogram:
  min: 13
  max: 42783
  mean: 1582.825
  std dev: 3160.757
  total observations: 355
  sum: 561903
  p25: 439
  p50: 610
  p75: 1645
  p90: 4095
  p95: 4683
  p99: 4863
Summary of `remote_cache_get_action_result_network_time_micros` observation histogram:
  min: 5840
  max: 13367
  mean: 8872.000
  std dev: 3240.186
  total observations: 3
  sum: 26616
  p25: 5843
  p50: 7411
  p75: 7411
  p90: 13367
  p95: 13367
  p99: 13367
Summary of `remote_store_blob_bytes_uploaded` observation histogram:
  min: 141
  max: 1505
  mean: 712.071
  std dev: 597.585
  total observations: 14
  sum: 9969
  p25: 142
  p50: 142
  p75: 1383
  p90: 1487
  p95: 1487
  p99: 1505
Summary of `local_store_read_blob_size` observation histogram:
  min: 20
  max: 493311
  mean: 27821.865
  std dev: 74640.093
  total observations: 355
  sum: 9876762
  p25: 315
  p50: 1207
  p75: 10727
  p90: 81343
  p95: 183167
  p99: 405247
Summary of `remote_process_time_run_ms` observation histogram:
  min: 473
  max: 23583
  mean: 7090.250
  std dev: 9542.821
  total observations: 4
  sum: 28361
  p25: 473
  p50: 2149
  p75: 2163
  p90: 23583
  p95: 23583
  p99: 23583
Summary of `remote_store_blob_bytes_downloaded` observation histogram:
  min: 449
  max: 12519
  mean: 4785.333
  std dev: 5479.918
  total observations: 3
  sum: 14356
  p25: 449
  p50: 1391
  p75: 1391
  p90: 12519
  p95: 12519
  p99: 12519
Summary of `remote_execution_rpc_first_response_time_micros` observation histogram:
  min: 2
  max: 6
  mean: 4.250
  std dev: 1.479
  total observations: 4
  sum: 17
  p25: 2
  p50: 4
  p75: 5
  p90: 6
  p95: 6
  p99: 6
Summary of `remote_cache_get_action_result_time_micros` observation histogram:
  min: 5880
  max: 13399
  mean: 8909.333
  std dev: 3236.489
  total observations: 3
  sum: 26728
  p25: 5883
  p50: 7451
  p75: 7451
  p90: 13399
  p95: 13399
  p99: 13399
Summary of `remote_store_read_blob_time_micros` observation histogram:
  min: 1108
  max: 25247
  mean: 9232.333
  std dev: 11319.557
  total observations: 3
  sum: 27697
  p25: 1108
  p50: 1349
  p75: 1349
  p90: 25247
  p95: 25247
  p99: 25247
Summary of `remote_store_time_to_first_byte_micros` observation histogram:
  min: 1071
  max: 25199
  mean: 9191.333
  std dev: 11314.604
  total observations: 3
  sum: 27574
  p25: 1071
  p50: 1311
  p75: 1311
  p90: 25199
  p95: 25199
  p99: 25199

real    0m29.424s
user    0m3.666s
sys     0m0.516s
(venv) coder@sandrino:~/platform/grid$
f
Copy code
remote_cache_requests: 3
  remote_cache_requests_cached: 0
  remote_cache_requests_uncached: 3
s
so remote caching is all I can really enable, not remote execution
f
basically yeah
✅ 1
s
at least I know I can give up in it.