Does anyone have any tips for tracking down cache misses Pants #general

Join Slack

Does anyone have any tips for tracking down cache ...

# general

powerful-scooter-95162

08/08/2025, 5:25 PM

Does anyone have any tips for tracking down cache misses?

fast-nail-55400

08/08/2025, 6:35 PM

To track down specific processes are not hitting cache or why a particular process didn't hit the cache?

powerful-scooter-95162

08/08/2025, 6:36 PM

Not sure about the difference here, but we're seeing cache misses on requirements pexes in CI a lot

fast-nail-55400

08/08/2025, 6:37 PM

well Pants runs a whole bunch of processes for each invocation, some may hit the cache and some may not.

fast-nail-55400

08/08/2025, 6:37 PM

so you'd first have to know which were not hitting the cache.

powerful-scooter-95162

08/08/2025, 6:37 PM

We have BuildBuddy setup so I am expecting cross-run hits

fast-nail-55400

08/08/2025, 6:38 PM

generally cache misses are from the cache key changing which for a process is basically all of the cli arguments and environment plus the input root

powerful-scooter-95162

08/08/2025, 6:38 PM

is there a way to dump all this to see what the difference in cache key is?

fast-nail-55400

08/08/2025, 6:39 PM

Yes, you can make use of my OpenTelemetry plugin to dump work units to a jsonl file. Then you'll see all of the relevant values if you drill down on the metadata for process-related work units.

🎉 1

🔥 1

fast-nail-55400

08/08/2025, 6:40 PM

https://github.com/shoalsoft/shoalsoft-pants-opentelemetry-plugin and set

--shoalsoft-opentelemetry-exporter=json-file

fast-nail-55400

08/08/2025, 6:41 PM

So you might need to write a script to manipulate the JSON and extract the relevant information.

powerful-scooter-95162

08/08/2025, 6:42 PM

that's awesome, thank you!

powerful-scooter-95162

08/08/2025, 6:42 PM

I might try hook this up to our gragana

fast-nail-55400

08/08/2025, 6:42 PM

You can find documentation on the process-related metadata here: https://github.com/shoalsoft/shoalsoft-pants-opentelemetry-plugin/blob/main/docs/streaming-workunit-handlers.md#process-metadata

fast-nail-55400

08/08/2025, 6:43 PM

and on debugging cache misses: generally any system-specific paths or other values that change from run to run can contribute to cache misses.

powerful-scooter-95162

08/08/2025, 6:44 PM

When you say system-specific paths, you presumably mean ENV vars or something else?

fast-nail-55400

08/08/2025, 6:45 PM

yes, the paths might be in env vars or in CLI args

fast-nail-55400

08/08/2025, 6:45 PM

but yeah

PATH

can be a culprit

fast-nail-55400

08/08/2025, 6:46 PM

but basically with the work unit dump, you should be able to find the metadata of the same "process" from two different runs, and then diff the process metadata.

powerful-scooter-95162

08/08/2025, 7:21 PM

is there a way to say env vars shouldn't go into the cache key? we have some things like AWS_SECRET_ACCESS_KEY in our env vars...

fast-nail-55400

08/08/2025, 7:26 PM

not currently. Pants models process execution on REAPI and all environment variables are used as part of the cache key for that.

fast-nail-55400

08/08/2025, 7:26 PM

And you said you use BuildBuddy so they are based on REAPI as well and would do the same thing.

fast-nail-55400

08/08/2025, 7:27 PM

(mainly since they expose an REAPI cache interface which I assume you are using)

fast-nail-55400

08/08/2025, 7:29 PM

You are not the first to encounter an issue with AWS secret keys like that. You can find Pants issues which discuss trying to find ways to not have to include AWS secret keys in env. https://github.com/pantsbuild/pants/issues?q=is%3Aissue%20state%3Aopen%20keyring

fast-nail-55400

08/08/2025, 7:29 PM

I had written a PoC PR as well along those lines to expose keyring support.

powerful-scooter-95162

08/08/2025, 9:17 PM

I feel like the better solution here is for pants to not be opinionated about the ENV var going into the cache key

powerful-scooter-95162

08/08/2025, 9:17 PM

and let users specify which ENV vars should be part of the cache key and which should not

fast-nail-55400

08/08/2025, 9:20 PM

The opinion came from using REAPI as the model and basically using REAPI protobufs for computing the cache key even for local cache.

powerful-scooter-95162

08/08/2025, 9:24 PM

I don't think there is any cost to deviating from REAPI's model for how cache keys should be constructed when users configure pants to act differently

powerful-scooter-95162

08/08/2025, 9:24 PM

REAPI codifies a very conservative set of assumptions

powerful-scooter-95162

08/08/2025, 9:24 PM

but users know when those assumptions are too conservative

powerful-scooter-95162

08/08/2025, 9:26 PM

I think I found a workaround for the aws keys by sticking them in ~/.aws/credentials but this seems unnecessary

fast-nail-55400

08/08/2025, 9:53 PM

I don't think there is any cost to deviating from REAPI's model for how cache keys should be constructed when users configure pants to act differently

I perceive the following costs though: 1. The REAPI cache assumptions are still in play for

remote_environment

so deviation creates a semantic difference between

local_environment

and

remote_environment

since Pants does not have control over REAPI servers. This semantic difference is a cost which would need to be documented and taught to Pants users. It is also a complication on anyone trying to contribute to Pants to learn how Pants handles processes. 2. One-off transaction cost for change: The REAPI assumptions are in place all over the Pants code base and so the change is very much non-trivial. Moreover, someone would actually have to do the work whether by contributing the work or hiring someone to do so. (From what I currently know, I don't believe any maintainers have an (inherent) interest in making this sort of fundamental change currently, but I could be wrong on that point. As for me, I do paid work on Pants; someone would need to hire me to design and make this sort of change.)

powerful-scooter-95162

08/08/2025, 9:55 PM

that's fair, I honestly didn't think pants worked with REAPI execution, I thought it only did caching

fast-nail-55400

08/08/2025, 9:55 PM

Frankly there are bunch of points on which I'd love to deviate from REAPI to try and innovate on how Pants does execution: 1. Environment variables / secrets management 2. Append-only named cache support (not currently supported with

remote_environment

except via a hacky advanced option and the need to control the server environment) 3. Persistent workers

fast-nail-55400

08/08/2025, 9:56 PM

> I honestly didn't think pants worked with REAPI execution The lack of named cache support in REAPI makes the Python backend practically unusable with

remote_environment

since Pex builds venvs in the named cache. And since I perceive most people using Pants for Python support, sort of makes

remote_environment

useless.

7 Views

Open in Slack

Previous Next