sketch of a design for “mutable” (somewhat) proces...
# development
w
sketch of a design for “mutable” (somewhat) process caches: https://docs.google.com/document/d/1n_MVVGjrkTKTPKHqRPlyfFzQyx2QioclMG_Q3DMUgYk/edit?usp=sharing (accessible to pants-devel@) … feedback welcome!
hey folks: got some good feedback on this, and ended up overhauling it to move processes that have self-referential caches (read and written for the current target) out of scope. we’ll likely still do something in that area pre-2.0, but it has a very different solution from resolvers.
👍 2
@enough-analyst-54434: regarding remoting
it’s briefly discussed at the very end
cc @average-vr-56795, @fast-nail-55400
but the rough idea is to use sidechannel information in the remoting protocol to signal that caches should be mounted
ie, either platform information or environment variables
e
But that needs to be official right? Without an API you need to re-setup / re-negotiated that backdoor with each remoting provider.
w
correct.
the API in the PR sets the expectation that processes should always check whether the environment variable is set: if it isn’t, it’s because the executor doesn’t support the feature
e
OK, so that paired with the observation you can get local caches by just being non-hermetic (we did this by mistake in early v2 python and used ~/.pex) makes me want to slow down and maybe get the remote protocol addressed 1st.
Basically, fwict the current PR gets you nothing substantially different than just being non hermetic in the Process request.
f
@witty-crayon-22786: have we reached out the remote execution working group at all to see if others have similar use cases?
w
i’ll be back in 5 minutes, sorry
(argh, who calls people on the phone)
e
NP. To give a concrete example, if we setup our Pex processes today to use ~/.cache/pants/pex say, then we'd have a local shared cache and the associated perf improvements and that would degrade gracefully in strict remote execution environments like toolchains and fall back to a temporary dir throw away cache with a warning.
I'd like to do this right however ... but it seems that hinges on the remote API.
f
pex-specific: can we run a remote execution request to convert a pex into its unpacked form and then just use the output digest for the unpacked form as an input digest in subsequent requests?
e
Yes, this was the experimental work I did last time I was in town.
f
or here’s a thought: decompose a cache into unique digests that can be composed later to form a cache
e
Yes, but this requires elevating - do not fingerprint this input - to the API level.
That's the fundamental sticky wicket.
Its just not performant for the cold case at all. Some improvement for warm cases where an individual 3rdparty dep is perturbed,
w
(back, sorry)
@fast-nail-55400: the other items you mention all amount to “invoking the tool recursively”
which, where possible, is a good idea.
but as mentioned in the doc, there are usecases for which that results in redundant work
(as John said)
e
OK - So Stu - you view this local change set as prep for the remoting API - the result here will be proposed there?
If so, I think that makes sense and can hammer on the local API as if it will also be the remote API.
w
if there is rough agreement on the side-channel approach, i think that we can fast-follow this change with one that starts sending signals in remote execution as well
call it experimental, see how it works, start the thread with the group
yep
e
K. I think side-channel is the wrong thing though. It would be better to have the same new slot in the remote process API as in the new Process
No need for a side-channel, this will have to be 1st class.
w
ah. yea, if the group agrees, then certainly.
i’m expecting that to be an … extended process.
e
OK - good. The env var thing is clunky which prompted this line of thought.
It felt like a backdoor dance that could be more straight forward and declarative.
w
it is, yea. because i think that it needs to be significantly more collision resistant on a cluster than it needs to be locally.
f
also need to know how to deal with a corrupted cache
w
… or maybe not. maybe even locally we should be defending more carefully against collisions.
@rule
x declaring a cache named “pex_root”, which
@rule
y does as well
also need to know how to deal with a corrupted cache
@fast-nail-55400: yep
f
does an executor need to signal to the cluster that a cache is corrupt and should not be reused?
so now you need an output side-channel to convey that
w
no
e
I don't think so.just a mixin suffix
☝️ 1
f
where the REAPI only has exit code and output paths
w
right ^ what John said
f
mixin suffix?
e
to the cache name
pex_root-v1
w
… without actually renaming it, ideally. but yea.
e
Then bump v1 to v2 as an option
w
pants already has a “remote execution namespace” that we mix in as an env var. could use the same thing here, or something else.
f
that requires human intervention, correct? what I am suggesting is that the code itself direct the cluster to throw away that instance of the cache
e
Ah yes, it does. I don't think we've hit upon the auto-detect case yet.
The manual case has been humans not a bug in use or construction of caches.
s/not/note/
f
there’s two sorts of corruption: cache poisoning with bad content but the cache structure is fine, and corruption of the cache structure itself
e
OK, thanks Stu - this helps. I think I'm good with this all save the env var plumbing. That seems clunkier than need be.
w
@enough-analyst-54434: env vars in the context of remoting? or for local execution as well?
e
The use of env vars to tell the process where the cache dir is.
That seems clunky for local and remote.
w
hm. that part feels inevitable though… is there an alternative?
e
I recommended one on the PR, yeah.
👍 1
w
oh, cool beans, heh.
@enough-analyst-54434: do you want to take a look at https://github.com/pantsbuild/pants/pull/9852 before i land it?
ended up applying the symlink-to-default-location-based-on-a-set-of-caches approach
e
Aha - looking.