I'm about 17% sure I've seen this before somewhere...
# plugins
w
I'm about 17% sure I've seen this before somewhere, but do we have an example of a Subsystem that can download/install OR can be specified via a host search path? There are lots of examples with one or the other, but I thought I saw a Subsystem that allowed installing it, or providing a search path
h
Nothing comes to mind.
w
I suppose 17% sure also means 83% unsure....
h

https://frinkiac.com/video/S03E14/I2SXzKbSpMYKnMuQ-6tUnmaWBwA=.gif

f
I had tried experimenting with this sort of feature for the terraform backend.
w
So conceptually, I want to be able to use clang/gcc (likely locally installed), but the embedded ARM gcc toolchain can be downloaded and referenced. However, they recently changed their setup, so instead of a 1GB setup, it's now 5GB, which is much lamer
But, would be nice to be able to point at any of them (from a single subsystem)
f
Putting aside how to configure it, putting 5GB into the Pants cache is lot of data. Especially if Pants has to expand it to disk every run (assuming immutable inputs are being used, if not, it would expand to disk for every process sandbox).
Expanding to disk caused performance issues for the Go backend for the early iterations of the backend which downloaded the Go SDK. And that was on the order of 150 MB in size, let alone GB.
w
I don't actuall yknow if it's that big, that's what I've heard. The initial one was closer to 600-800MB for me, but if the cache would struggle > 100MB, then I'll not worry about this for now. It would be nice to be able to specify a backend in a subsystem and have that pulled down, so it wouldn't need to be done out of band. I use docker containers for this right now, but having Pants do it would save that step
Especially if Pants has to expand it to disk every run
Is that what would happen? I thought that once it's in the ~/.cache/pants/blah directory - it's just referenced from there from the filesystem?
The idea being that the Subsystem could allow multiple versions of these toolchains, but they would just all use the same instances after the initial download and unpacking
f
At least for downloads using the
DownloadFile
intrinsic, the download goes directly into the Pants LMDB store (
~/.cache/pants/lmdb_store
) which is a database of blobs. And it is stored not expanded to the actual filesystem.
(And this is what
ExternalTool
-based tools use to download.)
Thus it is a
Digest
that would have to be expanded to disk every time it is used.
Process
has the notion of an "immutable input" which expands to disk once per Pants session and symlinks the expanded
Digest
into execution sandboxes. But to ensure no file corruption (in case let's say something were to modify the on-disk cache) it is expanded each session.
There are alternate approaches. For JVM, JDK download is managed by Coursier and Pants just invokes Coursier to download and expand JDK distributions.
(That is also the approach that I was favoring in the draft Rust backend PR. Having Pants just rely on
rustup
to manage Rust toolchain downloads and expansion. https://github.com/tdyas/pants/blob/rust_backend/src/python/pants/backend/rust/util_rules/toolchains.py )
The CC backend could manage downloaded gcc distributions on its own in a manner more like Cousier and rustup. At the very least, it will probably need to forgo using
ExternalTool
to avoid performance issues.
Pants does not really have a good answer currently for managing multi-GB tool distributions by itself.
👍 2
w
Thanks for the info - this might offhandedly explain a performance problem I was having in one of my other backends, where a lot of code was bundled together and downloaded as an ExternalTool or similar, I noticed it felt a bit slower to run sometimes... Need to re-investigate with this new info.
1