Does anyone have any tips for tracking down cache ...
# general
p
Does anyone have any tips for tracking down cache misses?
f
To track down specific processes are not hitting cache or why a particular process didn't hit the cache?
p
Not sure about the difference here, but we're seeing cache misses on requirements pexes in CI a lot
f
well Pants runs a whole bunch of processes for each invocation, some may hit the cache and some may not.
so you'd first have to know which were not hitting the cache.
p
We have BuildBuddy setup so I am expecting cross-run hits
f
generally cache misses are from the cache key changing which for a process is basically all of the cli arguments and environment plus the input root
p
is there a way to dump all this to see what the difference in cache key is?
f
Yes, you can make use of my OpenTelemetry plugin to dump work units to a jsonl file. Then you'll see all of the relevant values if you drill down on the metadata for process-related work units.
🎉 1
🔥 1
https://github.com/shoalsoft/shoalsoft-pants-opentelemetry-plugin and set
--shoalsoft-opentelemetry-exporter=json-file
So you might need to write a script to manipulate the JSON and extract the relevant information.
p
that's awesome, thank you!
I might try hook this up to our gragana
f
and on debugging cache misses: generally any system-specific paths or other values that change from run to run can contribute to cache misses.
p
When you say system-specific paths, you presumably mean ENV vars or something else?
f
yes, the paths might be in env vars or in CLI args
but yeah
PATH
can be a culprit
but basically with the work unit dump, you should be able to find the metadata of the same "process" from two different runs, and then diff the process metadata.
p
is there a way to say env vars shouldn't go into the cache key? we have some things like AWS_SECRET_ACCESS_KEY in our env vars...
f
not currently. Pants models process execution on REAPI and all environment variables are used as part of the cache key for that.
And you said you use BuildBuddy so they are based on REAPI as well and would do the same thing.
(mainly since they expose an REAPI cache interface which I assume you are using)
You are not the first to encounter an issue with AWS secret keys like that. You can find Pants issues which discuss trying to find ways to not have to include AWS secret keys in env. https://github.com/pantsbuild/pants/issues?q=is%3Aissue%20state%3Aopen%20keyring
I had written a PoC PR as well along those lines to expose keyring support.
p
I feel like the better solution here is for pants to not be opinionated about the ENV var going into the cache key
and let users specify which ENV vars should be part of the cache key and which should not
f
The opinion came from using REAPI as the model and basically using REAPI protobufs for computing the cache key even for local cache.
p
I don't think there is any cost to deviating from REAPI's model for how cache keys should be constructed when users configure pants to act differently
REAPI codifies a very conservative set of assumptions
but users know when those assumptions are too conservative
I think I found a workaround for the aws keys by sticking them in ~/.aws/credentials but this seems unnecessary
f
I don't think there is any cost to deviating from REAPI's model for how cache keys should be constructed when users configure pants to act differently
I perceive the following costs though: 1. The REAPI cache assumptions are still in play for
remote_environment
so deviation creates a semantic difference between
local_environment
and
remote_environment
since Pants does not have control over REAPI servers. This semantic difference is a cost which would need to be documented and taught to Pants users. It is also a complication on anyone trying to contribute to Pants to learn how Pants handles processes. 2. One-off transaction cost for change: The REAPI assumptions are in place all over the Pants code base and so the change is very much non-trivial. Moreover, someone would actually have to do the work whether by contributing the work or hiring someone to do so. (From what I currently know, I don't believe any maintainers have an (inherent) interest in making this sort of fundamental change currently, but I could be wrong on that point. As for me, I do paid work on Pants; someone would need to hire me to design and make this sort of change.)
p
that's fair, I honestly didn't think pants worked with REAPI execution, I thought it only did caching
f
Frankly there are bunch of points on which I'd love to deviate from REAPI to try and innovate on how Pants does execution: 1. Environment variables / secrets management 2. Append-only named cache support (not currently supported with
remote_environment
except via a hacky advanced option and the need to control the server environment) 3. Persistent workers
> I honestly didn't think pants worked with REAPI execution The lack of named cache support in REAPI makes the Python backend practically unusable with
remote_environment
since Pex builds venvs in the named cache. And since I perceive most people using Pants for Python support, sort of makes
remote_environment
useless.