Hey all :wave: Are the files (`*.zip`, `*.tar.xz`,...
# general
a
Hey all 👋 Are the files (
*.zip
,
*.tar.xz
, etc) downloaded during the
TemplatedExternalTool
setup cached somehow? If yes, where/how are those files cached?
w
They're in the lmdb store, under the hash name of the file
if you use
--keep-sandboxes=always
you can find them in your sandbox e.g.
Copy code
-r-xr-xr-x    1 sj  staff  4303745 Jul  3 14:37 science-macos-aarch64
Buuut, in terms of cache (~/.cache/pants/lmdb_store/immutable/files/d1)
Copy code
-r-xr-xr-x   1 sj  staff  4303745 Jul  3 14:39 d1e6eefd9bc89f2edb39775435ee25ad7fd5b158431561ac6fbbbf1552f855c0
There is probably a way to make them exportable though? But I'm not sure
a
I see. Is the digest of the file also used during the cache lookup? In the case where I have the same file name but I've changed the version of the external tool, is pants gonna download the new file and invalidate the old one in the cache?
w
It'll probably keep both, as the download is an append-only cache
a
Okay, but I shouldn't have any problems having the same .zip file name for different versions of my external tool, right?
w
I'm not sure I understand what you mean, but it's based on content when it comes to the cache part.
a
I'm currently storing my binaries on S3, the final zip file have the same name as I have the version as folder, e.g.
<s3://my_bucket/my_external_tool/1.0.0/my_tool.zip>
, so in this case every time I download my external tool, it will download
my_tool.zip
, but the content can be different according to the version.
w
Ahh, interesting... It's probabyl worth looking at this to see if there is any pre-download optimization it tries to go through (e.g. if it conditionally tries not to download). Also running with
-ldebug
might provide some insight https://github.com/pantsbuild/pants/blob/50a4e75b69321f3a2d3cb110433144c7f586ae38/src/python/pants/core/util_rules/external_tool.py#L353
a
Got it, makes sense. Thank you very much for all the explanations.
w
👍 I'd be curious what you find out. My gut feel is that, if you're already providing a digest value as part of the external tool downloads, then Pants wouldn't try to re-download if that's already in cache - as in, that's the cache key of interest
I can't imagine it would eagerly download a file of some size just to check the hash
a
Yeah, in that case I would have problems when eventually specifying a different external tool version, as the new file wouldn't be downloaded and the old one would be always used.
w
Well, we're generally expecting a known size and known hash for supply chain reasons, but it sounds like you want to bypass that?
Here is an example where we just basically have a changing hash: https://github.com/pantsbuild/pants/blob/50a4e75b69321f3a2d3cb110433144c7f586ae38/src/python/pants/backend/shell/subsystems/shunit2.py#L11 At the very least, I assume you'd need to update something in your codebase with the new hash?
a
Hmm, my case is exactly like this
shunit2
, so I guess my question is: if I specify a different shunit2 version, pants would download the newer version or use an already downloaded
shunit2
if present on the cache? I mean, I could change my tool publishing to append the version to the zip file just to be sure it will not cause any problems, but I was just wondering if pants is already handling that.
w
I haven't tested it out, but it should pick the version you specify, however you're specifying versions. For example, in your pants.toml, you can update the digest, or known_versions, or whatever your mechanism is
a
Got it. Thanks a lot for all the explanations, it really helped!
No prob!
b
(Just confirming the earliest question: the cache is based on the content of the files, not their filenames. This means both: files with the same name but different content are correctly cached separately, and, files with different names and the same content are deduplicated in the cache (saving disk space).)