sketch of a design for mutable somewhat process caches <http Pants #development

sketch of a design for “mutable” (somewhat) proces...

witty-crayon-22786

05/16/2020, 5:25 AM

sketch of a design for “mutable” (somewhat) process caches: https://docs.google.com/document/d/1n_MVVGjrkTKTPKHqRPlyfFzQyx2QioclMG_Q3DMUgYk/edit?usp=sharing (accessible to pants-devel@) … feedback welcome!

witty-crayon-22786

05/19/2020, 9:20 PM

hey folks: got some good feedback on this, and ended up overhauling it to move processes that have self-referential caches (read and written for the current target) out of scope. we’ll likely still do something in that area pre-2.0, but it has a very different solution from resolvers.

👍 2

witty-crayon-22786

05/22/2020, 6:29 PM

@enough-analyst-54434: regarding remoting

witty-crayon-22786

05/22/2020, 6:29 PM

it’s briefly discussed at the very end

witty-crayon-22786

05/22/2020, 6:29 PM

cc @average-vr-56795, @fast-nail-55400

witty-crayon-22786

05/22/2020, 6:30 PM

but the rough idea is to use sidechannel information in the remoting protocol to signal that caches should be mounted

witty-crayon-22786

05/22/2020, 6:30 PM

ie, either platform information or environment variables

enough-analyst-54434

05/22/2020, 6:31 PM

But that needs to be official right? Without an API you need to re-setup / re-negotiated that backdoor with each remoting provider.

witty-crayon-22786

05/22/2020, 6:31 PM

correct.

witty-crayon-22786

05/22/2020, 6:32 PM

the API in the PR sets the expectation that processes should always check whether the environment variable is set: if it isn’t, it’s because the executor doesn’t support the feature

enough-analyst-54434

05/22/2020, 6:32 PM

OK, so that paired with the observation you can get local caches by just being non-hermetic (we did this by mistake in early v2 python and used ~/.pex) makes me want to slow down and maybe get the remote protocol addressed 1st.

enough-analyst-54434

05/22/2020, 6:33 PM

Basically, fwict the current PR gets you nothing substantially different than just being non hermetic in the Process request.

fast-nail-55400

05/22/2020, 6:35 PM

@witty-crayon-22786: have we reached out the remote execution working group at all to see if others have similar use cases?

witty-crayon-22786

05/22/2020, 6:36 PM

i’ll be back in 5 minutes, sorry

witty-crayon-22786

05/22/2020, 6:38 PM

(argh, who calls people on the phone)

enough-analyst-54434

05/22/2020, 6:38 PM

NP. To give a concrete example, if we setup our Pex processes today to use ~/.cache/pants/pex say, then we'd have a local shared cache and the associated perf improvements and that would degrade gracefully in strict remote execution environments like toolchains and fall back to a temporary dir throw away cache with a warning.

enough-analyst-54434

05/22/2020, 6:39 PM

I'd like to do this right however ... but it seems that hinges on the remote API.

fast-nail-55400

05/22/2020, 6:39 PM

pex-specific: can we run a remote execution request to convert a pex into its unpacked form and then just use the output digest for the unpacked form as an input digest in subsequent requests?

enough-analyst-54434

05/22/2020, 6:40 PM

Yes, this was the experimental work I did last time I was in town.

fast-nail-55400

05/22/2020, 6:40 PM

or here’s a thought: decompose a cache into unique digests that can be composed later to form a cache

enough-analyst-54434

05/22/2020, 6:41 PM

Yes, but this requires elevating - do not fingerprint this input - to the API level.

enough-analyst-54434

05/22/2020, 6:41 PM

That's the fundamental sticky wicket.

enough-analyst-54434

05/22/2020, 6:44 PM

@fast-nail-55400 on decompose, that's here: 1. save component cache: https://github.com/pantsbuild/pants/pull/9747/files#diff-46bc84d0605912aad51eaf43d4bbee4fR181 2. merge a bunch together: https://github.com/pantsbuild/pants/pull/9747/files#diff-46bc84d0605912aad51eaf43d4bbee4fR281

enough-analyst-54434

05/22/2020, 6:45 PM

Its just not performant for the cold case at all. Some improvement for warm cases where an individual 3rdparty dep is perturbed,

witty-crayon-22786

05/22/2020, 6:46 PM

(back, sorry)

witty-crayon-22786

05/22/2020, 6:46 PM

https://pantsbuild.slack.com/archives/C0D7TNJHL/p1590172557408600?thread_ts=1589606724.266500&cid=C0D7TNJHL no, but we should once we have a good idea of what it will look like.

witty-crayon-22786

05/22/2020, 6:47 PM

@fast-nail-55400: the other items you mention all amount to “invoking the tool recursively”

witty-crayon-22786

05/22/2020, 6:47 PM

which, where possible, is a good idea.

witty-crayon-22786

05/22/2020, 6:48 PM

but as mentioned in the doc, there are usecases for which that results in redundant work

witty-crayon-22786

05/22/2020, 6:48 PM

(as John said)

enough-analyst-54434

05/22/2020, 6:49 PM

OK - So Stu - you view this local change set as prep for the remoting API - the result here will be proposed there?

enough-analyst-54434

05/22/2020, 6:49 PM

If so, I think that makes sense and can hammer on the local API as if it will also be the remote API.

witty-crayon-22786

05/22/2020, 6:49 PM

if there is rough agreement on the side-channel approach, i think that we can fast-follow this change with one that starts sending signals in remote execution as well

witty-crayon-22786

05/22/2020, 6:49 PM

call it experimental, see how it works, start the thread with the group

witty-crayon-22786

05/22/2020, 6:50 PM

yep

enough-analyst-54434

05/22/2020, 6:50 PM

K. I think side-channel is the wrong thing though. It would be better to have the same new slot in the remote process API as in the new Process

enough-analyst-54434

05/22/2020, 6:50 PM

No need for a side-channel, this will have to be 1st class.

witty-crayon-22786

05/22/2020, 6:50 PM

ah. yea, if the group agrees, then certainly.

witty-crayon-22786

05/22/2020, 6:51 PM

i’m expecting that to be an … extended process.

enough-analyst-54434

05/22/2020, 6:51 PM

OK - good. The env var thing is clunky which prompted this line of thought.

enough-analyst-54434

05/22/2020, 6:51 PM

It felt like a backdoor dance that could be more straight forward and declarative.

witty-crayon-22786

05/22/2020, 6:51 PM

it is, yea. because i think that it needs to be significantly more collision resistant on a cluster than it needs to be locally.

fast-nail-55400

05/22/2020, 6:53 PM

also need to know how to deal with a corrupted cache

witty-crayon-22786

05/22/2020, 6:53 PM

… or maybe not. maybe even locally we should be defending more carefully against collisions.

@rule

x declaring a cache named “pex_root”, which

@rule

y does as well

witty-crayon-22786

05/22/2020, 6:53 PM

also need to know how to deal with a corrupted cache

@fast-nail-55400: yep

fast-nail-55400

05/22/2020, 6:53 PM

does an executor need to signal to the cluster that a cache is corrupt and should not be reused?

fast-nail-55400

05/22/2020, 6:54 PM

so now you need an output side-channel to convey that

witty-crayon-22786

05/22/2020, 6:54 PM

enough-analyst-54434

05/22/2020, 6:54 PM

I don't think so.just a mixin suffix

☝️ 1

fast-nail-55400

05/22/2020, 6:54 PM

where the REAPI only has exit code and output paths

witty-crayon-22786

05/22/2020, 6:54 PM

right ^ what John said

fast-nail-55400

05/22/2020, 6:54 PM

mixin suffix?

enough-analyst-54434

05/22/2020, 6:54 PM

to the cache name

enough-analyst-54434

05/22/2020, 6:54 PM

pex_root-v1

witty-crayon-22786

05/22/2020, 6:55 PM

… without actually renaming it, ideally. but yea.

enough-analyst-54434

05/22/2020, 6:55 PM

Then bump v1 to v2 as an option

witty-crayon-22786

05/22/2020, 6:55 PM

pants already has a “remote execution namespace” that we mix in as an env var. could use the same thing here, or something else.

fast-nail-55400

05/22/2020, 6:55 PM

that requires human intervention, correct? what I am suggesting is that the code itself direct the cluster to throw away that instance of the cache

enough-analyst-54434

05/22/2020, 6:56 PM

Ah yes, it does. I don't think we've hit upon the auto-detect case yet.

enough-analyst-54434

05/22/2020, 6:56 PM

The manual case has been humans not a bug in use or construction of caches.

enough-analyst-54434

05/22/2020, 6:56 PM

s/not/note/

fast-nail-55400

05/22/2020, 6:57 PM

there’s two sorts of corruption: cache poisoning with bad content but the cache structure is fine, and corruption of the cache structure itself

enough-analyst-54434

05/22/2020, 6:57 PM

OK, thanks Stu - this helps. I think I'm good with this all save the env var plumbing. That seems clunkier than need be.

witty-crayon-22786

05/22/2020, 6:57 PM

@enough-analyst-54434: env vars in the context of remoting? or for local execution as well?

enough-analyst-54434

05/22/2020, 6:57 PM

The use of env vars to tell the process where the cache dir is.

enough-analyst-54434

05/22/2020, 6:58 PM

That seems clunky for local and remote.

witty-crayon-22786

05/22/2020, 6:58 PM

hm. that part feels inevitable though… is there an alternative?

enough-analyst-54434

05/22/2020, 6:58 PM

I recommended one on the PR, yeah.

👍 1

witty-crayon-22786

05/22/2020, 6:58 PM

oh, cool beans, heh.

witty-crayon-22786

05/26/2020, 10:41 PM

@enough-analyst-54434: do you want to take a look at https://github.com/pantsbuild/pants/pull/9852 before i land it?

witty-crayon-22786

05/26/2020, 10:41 PM

ended up applying the symlink-to-default-location-based-on-a-set-of-caches approach

enough-analyst-54434

05/26/2020, 10:42 PM

Aha - looking.

Open in Slack

Previous Next