If that's the case, I'm not sure if I should bothe...
# development
b
If that's the case, I'm not sure if I should bother plumbing it to Coursier. Letting Coursier use its default global cache (
~/.cache/coursier
) seems equally safe but with hypothetically better cache hits--unless I'm missing something here
e
The issue here is remote caching. If you use the results of the coursier resolve in a subsequent Process you'll need to either stuff all the jars into LMDB or else record relative paths to the jars from the sandbox root. You can only do the latter with a named_cache unless you copy jars into the Process chroot.
We do not have a test harness for remote cache effects, but an OSS one exists and we hear about problems eventually. Just a sec for links. This caught a bug I introduced by storing an absolute path in a script save in a Process chroot.
The 1st implementation of Pants v2 Python support used ~/.pex though and it mostly works. So you could cheat for a while to start to get PRs landing and things more incremental.
b
OK, I think I was vaguely aware of this issue when I started on the Coursier work and went straight to the answer you started with: my remote invocation of Coursier has a wrapper script that snags all of the downloaded artifacts into the Process
output_dir
, and all later uses of those artifacts are in terms of Pants
Digest
and related engine filesystem APIs
Basically, the global Coursier cache will allow some invocations of Coursier to go faster, but nothing on the pants side depends on Coursier having been run on the same machine as whatever later invocation consumes the fetched artifacts
e
OK. You should be good to go then. I know I found Pex was way slower this way. To be fair I was storing the whole pex cache in lmdb, but it cost ~500ms per chroot materialization to do so.
So an interesting spelunk down the road would be to see if named_caches buys any perf gain of interest.
Doing it like you're doing it though is ideally the fastest path ging forward though because its a much simpler story with remote execution and caching. Right now no re-exec server supports named_caches for example.
b
Yeah, it seems like at worst I'm duplicating the union of all workers' global Coursier caches into lmdb, but in exchange I'm getting a very clean data model and very clear hermeticity guarantees
And my hunch is that in environments with warm caches, this will end up being about as fast as you can get
e
Yup - nice to start there and only get sirty if you need to later.