If that s the case I m not sure if I should bother plumbing Pants #development

If that's the case, I'm not sure if I should bothe...

bored-art-40741

04/10/2021, 9:45 PM

If that's the case, I'm not sure if I should bother plumbing it to Coursier. Letting Coursier use its default global cache (

~/.cache/coursier

) seems equally safe but with hypothetically better cache hits--unless I'm missing something here

enough-analyst-54434

04/10/2021, 10:24 PM

The issue here is remote caching. If you use the results of the coursier resolve in a subsequent Process you'll need to either stuff all the jars into LMDB or else record relative paths to the jars from the sandbox root. You can only do the latter with a named_cache unless you copy jars into the Process chroot.

enough-analyst-54434

04/10/2021, 10:25 PM

We do not have a test harness for remote cache effects, but an OSS one exists and we hear about problems eventually. Just a sec for links. This caught a bug I introduced by storing an absolute path in a script save in a Process chroot.

enough-analyst-54434

04/10/2021, 10:25 PM

https://gitlab.com/remote-apis-testing/remote-apis-testing

enough-analyst-54434

04/10/2021, 10:28 PM

Example problem and solution was: https://github.com/pantsbuild/pants/issues/11753 https://github.com/pantsbuild/pants/pull/11760

enough-analyst-54434

04/10/2021, 10:30 PM

The 1st implementation of Pants v2 Python support used ~/.pex though and it mostly works. So you could cheat for a while to start to get PRs landing and things more incremental.

bored-art-40741

04/10/2021, 10:32 PM

OK, I think I was vaguely aware of this issue when I started on the Coursier work and went straight to the answer you started with: my remote invocation of Coursier has a wrapper script that snags all of the downloaded artifacts into the Process

output_dir

, and all later uses of those artifacts are in terms of Pants

Digest

and related engine filesystem APIs

bored-art-40741

04/10/2021, 10:34 PM

Basically, the global Coursier cache will allow some invocations of Coursier to go faster, but nothing on the pants side depends on Coursier having been run on the same machine as whatever later invocation consumes the fetched artifacts

enough-analyst-54434

04/10/2021, 10:35 PM

OK. You should be good to go then. I know I found Pex was way slower this way. To be fair I was storing the whole pex cache in lmdb, but it cost ~500ms per chroot materialization to do so.

enough-analyst-54434

04/10/2021, 10:36 PM

So an interesting spelunk down the road would be to see if named_caches buys any perf gain of interest.

enough-analyst-54434

04/10/2021, 10:38 PM

Doing it like you're doing it though is ideally the fastest path ging forward though because its a much simpler story with remote execution and caching. Right now no re-exec server supports named_caches for example.

bored-art-40741

04/10/2021, 10:40 PM

Yeah, it seems like at worst I'm duplicating the union of all workers' global Coursier caches into lmdb, but in exchange I'm getting a very clean data model and very clear hermeticity guarantees

bored-art-40741

04/10/2021, 10:40 PM

And my hunch is that in environments with warm caches, this will end up being about as fast as you can get

enough-analyst-54434

04/10/2021, 10:40 PM

Yup - nice to start there and only get sirty if you need to later.

Open in Slack

Previous Next