Is it expected that changing pants versions invalidates the Pants #general

Is it expected that changing pants versions invali...

broad-processor-92400

12/12/2022, 2:17 AM

Is it expected that changing pants versions invalidates the local cache? To validate our future upgrade from 2.14.0, I ran both 2.15.0rc0 and 2.16.0.dev2 through our CI, without other code changes, against a bulk-cached

~/.cache/pants/lmdb_store

(not remote cache), and both of them seemed to run from scratch (metrics like

local_cache_requests_cached: 0

). Running locally seemed to be similar, too, but I didn't pay that much attention. (Both upgrades seemed to work fine in terms of delivering a working deploy, at least ✅ )

bitter-ability-32190

12/12/2022, 2:29 AM

The process structure can change between versions, meaning the cache key for the same process is likely different.

👍 1

bitter-ability-32190

12/12/2022, 2:30 AM

Over time, this may smooth out, but we're still features-ablazin'

👍 1

broad-processor-92400

12/12/2022, 2:31 AM

Cool cool, sounds like nothing for me to worry about, except maybe busting the cache to start fresh since all that old data doesn’t need to be carried around. (And/or just getting approval for a remote cache) Thanks!

bitter-ability-32190

12/12/2022, 2:36 AM

(There's a chance one of the smarter folks chimes in and I'm wrong, so stay tuned.) But that'd explain process cache invalidation. I guess that wouldn't explain digest cache invalidation 🤔

broad-processor-92400

12/12/2022, 5:42 AM

No worries! Two questions (more for my own understanding of pants' internals than related to the original questions) 1. Can there be reuse/caching of digests even if no processes can be reused? (as in, can one end up with identical digests even with different process executions? my very loose understanding is that digest caching is basically snapshotting info associated with process executions?) 2. It seems like the metric quoted above is only process executions (based on grepping for

Metric::LocalCacheRequestsUncached

), and there's no direct equivalent for digest cache requests, there's only something along the lines of 'cached' `local_store_read_blob_size`/`ObservationMetric::LocalStoreReadBlobSize`, which isn't directly useful for answering "is pants reusing the digest cache from a previous build" (e.g. that metric has a large total sum like

16538104868

but that could easily be rereading data cached within that build). Is my interpretation correct?

bitter-ability-32190

12/12/2022, 12:28 PM

For 1. Yes. A digest is like a snapshot of files on disk identified by a fingerprint. When pants wants to put files on disk, it asks the "store" to do so by providing the fingerprint. This happens in the context of process sandboxes, but isn't directly related to processes. So no matter how much pants internals change, if the files that make up a digest were the same, the cache key stays the same.

bitter-ability-32190

12/12/2022, 12:29 PM

For 2. I'd have to look at the code to see if there's a metric for digest caches. I don't know off the top of my head

broad-processor-92400

12/12/2022, 6:50 PM

1. Ah and the fingerprint is just the contents, ignoring mtimes, etc? Makes sense, and I guess it also makes sense that the process execution is more “interesting” in terms of caching: I imagine copying files around isn’t a huge time cost generally.

4 Views

Open in Slack

Previous Next