Is it expected that changing pants versions invali...
# general
b
Is it expected that changing pants versions invalidates the local cache? To validate our future upgrade from 2.14.0, I ran both 2.15.0rc0 and 2.16.0.dev2 through our CI, without other code changes, against a bulk-cached
~/.cache/pants/lmdb_store
(not remote cache), and both of them seemed to run from scratch (metrics like
local_cache_requests_cached: 0
). Running locally seemed to be similar, too, but I didn't pay that much attention. (Both upgrades seemed to work fine in terms of delivering a working deploy, at least )
b
The process structure can change between versions, meaning the cache key for the same process is likely different.
👍 1
Over time, this may smooth out, but we're still features-ablazin'
👍 1
b
Cool cool, sounds like nothing for me to worry about, except maybe busting the cache to start fresh since all that old data doesn’t need to be carried around. (And/or just getting approval for a remote cache) Thanks!
b
(There's a chance one of the smarter folks chimes in and I'm wrong, so stay tuned.) But that'd explain process cache invalidation. I guess that wouldn't explain digest cache invalidation 🤔
b
No worries! Two questions (more for my own understanding of pants' internals than related to the original questions) 1. Can there be reuse/caching of digests even if no processes can be reused? (as in, can one end up with identical digests even with different process executions? my very loose understanding is that digest caching is basically snapshotting info associated with process executions?) 2. It seems like the metric quoted above is only process executions (based on grepping for
Metric::LocalCacheRequestsUncached
), and there's no direct equivalent for digest cache requests, there's only something along the lines of 'cached' `local_store_read_blob_size`/`ObservationMetric::LocalStoreReadBlobSize`, which isn't directly useful for answering "is pants reusing the digest cache from a previous build" (e.g. that metric has a large total sum like
16538104868
but that could easily be rereading data cached within that build). Is my interpretation correct?
b
For 1. Yes. A digest is like a snapshot of files on disk identified by a fingerprint. When pants wants to put files on disk, it asks the "store" to do so by providing the fingerprint. This happens in the context of process sandboxes, but isn't directly related to processes. So no matter how much pants internals change, if the files that make up a digest were the same, the cache key stays the same.
For 2. I'd have to look at the code to see if there's a metric for digest caches. I don't know off the top of my head
b
1. Ah and the fingerprint is just the contents, ignoring mtimes, etc? Makes sense, and I guess it also makes sense that the process execution is more “interesting” in terms of caching: I imagine copying files around isn’t a huge time cost generally.