Is there a standard practice to share remote cache...
# general
p
Is there a standard practice to share remote cache between CI and dev machines? I'm getting a lot more cache misses than expected
FWIW, I was looking at a build where the "Building requirements for blah.pex" step seemed like it should be cacheable
f
Different build systems handle addressing a bit differently. One of the harder aspects of remote cache is artifact resolution in non homogeneous systems. CI usually works well because all CI machines look the same, from environment, operating system, toolchains. Developer laptops are usually a bit more chaotic and require some amount of normalization of environment and backends that populate cache that look the same. Depending on security / company needs, one could allow for developer laptops to be homogeneous between developers and allow them to directly upload to a cache system that doesn't cross the CI/release path, usually with some sort of caching system that is verifying the hash sums of what is uploaded/downloaded to prevent poisoning (if using the google defined reapi). Another aspect where caching can fall short is hermeticity and reproducibility of the build artifacts, sometimes thats buildtool specific configurations, toolchains/generators or general practice of the builds.
p
Yeah, I get that, I feel like "resolve this set of dependency constraints to a set of packages and install them in a venv" should be the same across 2 linux/x64 machines though
It makes me think pants is putting too much external state into the cache key
not sure if this is configurable somehow, or I'm just out of luck
f
fwiw, a few ways I've debugged such things (I don't know the exact way with pants, would be great if someone could chime in on that), is to use the execution log and grpc logs in bazel, that way can see what hashes went into the execution context and what grpc requests are made between two different builds on the same git commit on different machines. If there is divergence in the hashes or keys, then start to narrow down what in the process or flow has the divergence
From a backend pov, you could have the backend log the requests between the two different builds to see which keys have chanced between the two different build machines
p
I'm definitely getting different fingerprints for the requests to the remote cache, so it's not a backend issue
b
What platform is ci and what platform is the dev machine? Can you share your pants.toml too?
p
I am realizing that the pants.ci.toml file may be part of the problem in general usage, but I also tried using that file locally and still had cache misses:
Copy code
[GLOBAL]
pants_version = "2.20.0rc1"
build_file_prelude_globs = ["pants-plugins/macros.py"]

backend_packages = [
  "pants.backend.docker",
  "pants.backend.python",
  "pants.backend.python.lint.black",
  "pants.backend.python.typecheck.mypy",
  "pants.backend.shell",
  "pants.backend.shell.lint.shfmt",
  "pants.backend.shell.lint.shellcheck",
  "pants.backend.experimental.java",
  "pants.backend.experimental.kotlin",
  "pants.backend.experimental.python",  # for vcs_version
  "pants.backend.experimental.terraform",
]

pants_ignore.add = ["!gcloud_key.json", "!keys.*", "!anubis/"]

remote_provider = "reapi"
remote_cache_read = true
remote_cache_write = true
remote_store_address = "<grpcs://remote.buildbuddy.io>"
remote_instance_name = "main"
remote_cache_warnings = "always"

[GLOBAL.remote_store_headers]
# BuildBuddy API key to Espresso AI org created by kilogram@. To rotate, create a new org.
x-buildbuddy-api-key = "XXX"

[source]
root_patterns = [
  "/",
  "anubis/spes/src/main/resources",
  "anubis/spes/src/main/java",
  "anubis/spes/src/main/kotlin",
]

[anonymous-telemetry]
enabled = true
repo_id = "d0a16741-97cc-471e-bd04-7e7622b63146"

[python]
interpreter_constraints = ["CPython>=3.11,<3.12"]
pip_version = "latest"
enable_resolves = true

[python-infer]
use_rust_parser = true

[kotlin]
version_for_resolve = "{'jvm-default': '1.9.0'}"

[jvm]
jdk = "temurin:1.11"

[repl]
shell = "ipython"

[python.resolves]
python-default = "3rdparty/python/default.lock"
mypy = "3rdparty/python/mypy.lock"
skypilot = "3rdparty/python/skypilot.lock"

[mypy]
install_from_resolve = "mypy"
requirements =["//3rdparty/python:mypy"]
# args = ["--check-untyped-defs"]
args = ["--no-incremental"]

[python-repos]
indexes = [
  "<https://pypi.org/simple/>",
  "<https://download.pytorch.org/whl/cu118/>",
  "<https://pypi.fury.io/xxx/>",
]

[shellcheck]
args = [
  "-e SC1091",  # Docker images use scripts from base layers, which cannot be found.
]

[subprocess-environment]
env_vars=[
  "AWS_ACCESS_KEY_ID",
  "AWS_SECRET_ACCESS_KEY",
  "AWS_DEFAULT_REGION",
  "ESPRESSO_AT_HEAD",
  "GITHUB_ACTIONS",
  "GITHUB_SHA",
]

[test]
extra_env_vars = ["HOME"]
output = "all"

[pytest]
args = ["-vv", "--no-header", "--log-cli-level=INFO", "-rP"]
CI is github actions (presumably linux x64), local is also linux x64
b
I suspect a major source of differences will be the env vars passed through in
subprocess-environment
and
test
since those will differ per machine, usually.
(And those are part of the cache key)
p
That seems overly cautious for things like resolving dependencies where theoretically pants knows that env vars are used
are there any workarounds for that?
I'm going to try bumping the pants version too, there was a bug with dep lists not being sorted appropriately and see if that helps too
probably as expected, bumping the version to 2.22.0a0 didn't help
b
Don't know of workarounds in the short term other than reducing the environment variables/only passing them to specific processes where each one is required.
Potentially pants should allow finer grained control over which processes need which env var. It sounds like some of these may only be required for
generate-lockfiles
or are they also used for other goals?
p
the AWS ones are largely for running some integration tests iirc, not even needed to do a build, maybe moving them to the test section would help.
all of them might actually fall into that category tbh
in the non-CI config we have aws env vars for auth when pulling/pushing docker images to ECR