https://pantsbuild.org/ logo
w

witty-crayon-22786

05/16/2020, 5:25 AM
sketch of a design for “mutable” (somewhat) process caches: https://docs.google.com/document/d/1n_MVVGjrkTKTPKHqRPlyfFzQyx2QioclMG_Q3DMUgYk/edit?usp=sharing (accessible to pants-devel@) … feedback welcome!
hey folks: got some good feedback on this, and ended up overhauling it to move processes that have self-referential caches (read and written for the current target) out of scope. we’ll likely still do something in that area pre-2.0, but it has a very different solution from resolvers.
👍 2
@enough-analyst-54434: regarding remoting
it’s briefly discussed at the very end
cc @average-vr-56795, @fast-nail-55400
but the rough idea is to use sidechannel information in the remoting protocol to signal that caches should be mounted
ie, either platform information or environment variables
e

enough-analyst-54434

05/22/2020, 6:31 PM
But that needs to be official right? Without an API you need to re-setup / re-negotiated that backdoor with each remoting provider.
w

witty-crayon-22786

05/22/2020, 6:31 PM
correct.
the API in the PR sets the expectation that processes should always check whether the environment variable is set: if it isn’t, it’s because the executor doesn’t support the feature
e

enough-analyst-54434

05/22/2020, 6:32 PM
OK, so that paired with the observation you can get local caches by just being non-hermetic (we did this by mistake in early v2 python and used ~/.pex) makes me want to slow down and maybe get the remote protocol addressed 1st.
Basically, fwict the current PR gets you nothing substantially different than just being non hermetic in the Process request.
f

fast-nail-55400

05/22/2020, 6:35 PM
@witty-crayon-22786: have we reached out the remote execution working group at all to see if others have similar use cases?
w

witty-crayon-22786

05/22/2020, 6:36 PM
i’ll be back in 5 minutes, sorry
(argh, who calls people on the phone)
e

enough-analyst-54434

05/22/2020, 6:38 PM
NP. To give a concrete example, if we setup our Pex processes today to use ~/.cache/pants/pex say, then we'd have a local shared cache and the associated perf improvements and that would degrade gracefully in strict remote execution environments like toolchains and fall back to a temporary dir throw away cache with a warning.
I'd like to do this right however ... but it seems that hinges on the remote API.
f

fast-nail-55400

05/22/2020, 6:39 PM
pex-specific: can we run a remote execution request to convert a pex into its unpacked form and then just use the output digest for the unpacked form as an input digest in subsequent requests?
e

enough-analyst-54434

05/22/2020, 6:40 PM
Yes, this was the experimental work I did last time I was in town.
f

fast-nail-55400

05/22/2020, 6:40 PM
or here’s a thought: decompose a cache into unique digests that can be composed later to form a cache
e

enough-analyst-54434

05/22/2020, 6:41 PM
Yes, but this requires elevating - do not fingerprint this input - to the API level.
That's the fundamental sticky wicket.
Its just not performant for the cold case at all. Some improvement for warm cases where an individual 3rdparty dep is perturbed,
w

witty-crayon-22786

05/22/2020, 6:46 PM
(back, sorry)
@fast-nail-55400: the other items you mention all amount to “invoking the tool recursively”
which, where possible, is a good idea.
but as mentioned in the doc, there are usecases for which that results in redundant work
(as John said)
e

enough-analyst-54434

05/22/2020, 6:49 PM
OK - So Stu - you view this local change set as prep for the remoting API - the result here will be proposed there?
If so, I think that makes sense and can hammer on the local API as if it will also be the remote API.
w

witty-crayon-22786

05/22/2020, 6:49 PM
if there is rough agreement on the side-channel approach, i think that we can fast-follow this change with one that starts sending signals in remote execution as well
call it experimental, see how it works, start the thread with the group
yep
e

enough-analyst-54434

05/22/2020, 6:50 PM
K. I think side-channel is the wrong thing though. It would be better to have the same new slot in the remote process API as in the new Process
No need for a side-channel, this will have to be 1st class.
w

witty-crayon-22786

05/22/2020, 6:50 PM
ah. yea, if the group agrees, then certainly.
i’m expecting that to be an … extended process.
e

enough-analyst-54434

05/22/2020, 6:51 PM
OK - good. The env var thing is clunky which prompted this line of thought.
It felt like a backdoor dance that could be more straight forward and declarative.
w

witty-crayon-22786

05/22/2020, 6:51 PM
it is, yea. because i think that it needs to be significantly more collision resistant on a cluster than it needs to be locally.
f

fast-nail-55400

05/22/2020, 6:53 PM
also need to know how to deal with a corrupted cache
w

witty-crayon-22786

05/22/2020, 6:53 PM
… or maybe not. maybe even locally we should be defending more carefully against collisions.
@rule
x declaring a cache named “pex_root”, which
@rule
y does as well
also need to know how to deal with a corrupted cache
@fast-nail-55400: yep
f

fast-nail-55400

05/22/2020, 6:53 PM
does an executor need to signal to the cluster that a cache is corrupt and should not be reused?
so now you need an output side-channel to convey that
w

witty-crayon-22786

05/22/2020, 6:54 PM
no
e

enough-analyst-54434

05/22/2020, 6:54 PM
I don't think so.just a mixin suffix
☝️ 1
f

fast-nail-55400

05/22/2020, 6:54 PM
where the REAPI only has exit code and output paths
w

witty-crayon-22786

05/22/2020, 6:54 PM
right ^ what John said
f

fast-nail-55400

05/22/2020, 6:54 PM
mixin suffix?
e

enough-analyst-54434

05/22/2020, 6:54 PM
to the cache name
pex_root-v1
w

witty-crayon-22786

05/22/2020, 6:55 PM
… without actually renaming it, ideally. but yea.
e

enough-analyst-54434

05/22/2020, 6:55 PM
Then bump v1 to v2 as an option
w

witty-crayon-22786

05/22/2020, 6:55 PM
pants already has a “remote execution namespace” that we mix in as an env var. could use the same thing here, or something else.
f

fast-nail-55400

05/22/2020, 6:55 PM
that requires human intervention, correct? what I am suggesting is that the code itself direct the cluster to throw away that instance of the cache
e

enough-analyst-54434

05/22/2020, 6:56 PM
Ah yes, it does. I don't think we've hit upon the auto-detect case yet.
The manual case has been humans not a bug in use or construction of caches.
s/not/note/
f

fast-nail-55400

05/22/2020, 6:57 PM
there’s two sorts of corruption: cache poisoning with bad content but the cache structure is fine, and corruption of the cache structure itself
e

enough-analyst-54434

05/22/2020, 6:57 PM
OK, thanks Stu - this helps. I think I'm good with this all save the env var plumbing. That seems clunkier than need be.
w

witty-crayon-22786

05/22/2020, 6:57 PM
@enough-analyst-54434: env vars in the context of remoting? or for local execution as well?
e

enough-analyst-54434

05/22/2020, 6:57 PM
The use of env vars to tell the process where the cache dir is.
That seems clunky for local and remote.
w

witty-crayon-22786

05/22/2020, 6:58 PM
hm. that part feels inevitable though… is there an alternative?
e

enough-analyst-54434

05/22/2020, 6:58 PM
I recommended one on the PR, yeah.
👍 1
w

witty-crayon-22786

05/22/2020, 6:58 PM
oh, cool beans, heh.
@enough-analyst-54434: do you want to take a look at https://github.com/pantsbuild/pants/pull/9852 before i land it?
ended up applying the symlink-to-default-location-based-on-a-set-of-caches approach
e

enough-analyst-54434

05/26/2020, 10:42 PM
Aha - looking.