gt we should probably just re enable the pex cache and trus Pants #development

> ...we should probably just re-enable the pex ...

hundreds-father-404

03/16/2020, 11:32 PM

...we should probably just re-enable the pex cache, and trust it to be concurrency safe.

We should probably take this to the new channel #performance , but we’ve been discussing this proposal at Toolchain and think it’s likely a good change to re-enable. At a minimum, re-enable when we know it’s local execution. Pex has sane caching afaict (John put a lot of sweat into ensuring that)

witty-crayon-22786

03/16/2020, 11:33 PM

yea.

witty-crayon-22786

03/16/2020, 11:33 PM

ditto

coursier

, for example.

👍 1

witty-crayon-22786

03/16/2020, 11:33 PM

it's not a universal truth, but.

hundreds-father-404

03/16/2020, 11:35 PM

MyPy is another example. MyPy has very sane caching that the Dropbox engineers spent 2 years working on. We’ve found that ignoring that MyPy cache results in roughly 5-10x worse performance (not formally benchmarked)

witty-crayon-22786

03/16/2020, 11:36 PM

re-enable when we know it’s local execution

i'm not sure this differentiation is worth it... in the remote case, the cost of writing to the cache should be minimal. and depending on the remoting implementation, it might not be in docker, which would allow it to actually share caches across invokes.

👍 1

hundreds-father-404

03/16/2020, 11:37 PM

The only gotcha is that Google RBE is read-only. That’s why John originally turned off the Pex cache

hundreds-father-404

03/16/2020, 11:38 PM

But, maybe we could use a simple heuristic or linux command to see if the FS is read-only? Only use the cache iff it’s possible?

witty-crayon-22786

03/16/2020, 11:38 PM

hm... read only? really?

hundreds-father-404

03/16/2020, 11:39 PM

(let me retrace that PR to confirm. I remember seeing it last week when I checked this out)

witty-crayon-22786

03/16/2020, 11:39 PM

funky.

hundreds-father-404

03/16/2020, 11:40 PM

https://github.com/pantsbuild/pants/issues/8283

witty-crayon-22786

03/16/2020, 11:41 PM

Ah, so not RBE. My understanding is that in docker, by default the homedir would be writable

witty-crayon-22786

03/16/2020, 11:42 PM

But... what does "default" really mean.

hundreds-father-404

03/16/2020, 11:46 PM

Do you think it would be better to implement “In the Python rule, allow passing extra args to the EPR iff running locally” or “Add some mechanism in the Python rule to inspect if the FS is readable, then use that to conditionally create the EPR args”

hundreds-father-404

03/16/2020, 11:47 PM

Neither sound great, but re-enabling the Pex cache would be an enormous win for V2 performance. Resolving requirements is by far the biggest slowdown

enough-analyst-54434

03/16/2020, 11:48 PM

Google RBE is (or was) not read only, which is bad. Toolchain's remote exec env is though mod the chroot dir.

👍 1

enough-analyst-54434

03/16/2020, 11:49 PM

If we're going to support "deemed safe" caches, we really need to plumb a hole into EPRs to allow for input digests demmed safe and not part of fingerprint.

👍 1

witty-crayon-22786

03/16/2020, 11:49 PM

well, this gets to the point i initially raised then.

witty-crayon-22786

03/16/2020, 11:50 PM

we have that actually... had forgotten.

enough-analyst-54434

03/16/2020, 11:50 PM

witty-crayon-22786

03/16/2020, 11:50 PM

with the downside that it actually computes the whole input digest, which might not be the best for a cache.

hundreds-father-404

03/16/2020, 11:50 PM

unsafe_local_only_files_because_we_favor_speed_over_correctness_for_this_rule

😀 1

enough-analyst-54434

03/16/2020, 11:50 PM

This will be a problem for all resolvers afaict, so a standard solution will be great.

👍 1

witty-crayon-22786

03/16/2020, 11:50 PM

and which gets to the point i was originally making about "resolve per library".

hundreds-breakfast-49010

03/16/2020, 11:51 PM

running locally seems like more of a fundamental operation as far as pants is concerned, than checking if a file system is rw or not

👍 1

happy-kitchen-89482

03/17/2020, 5:00 PM

Not sure I understand "resolve per library"? Today e.g., when you run tests we resolve once for the dep closure. What are you proposing?

witty-crayon-22786

03/17/2020, 5:01 PM

resolving once per library! literally

witty-crayon-22786

03/17/2020, 5:01 PM

that would have the effect of populating a cache of artifacts at each step

witty-crayon-22786

03/17/2020, 5:01 PM

which would grow as you moved "up" the graph

witty-crayon-22786

03/17/2020, 5:02 PM

such that when you were building a root, most things would be in the cache.

witty-crayon-22786

03/17/2020, 5:03 PM

since hitting the resolve cache is not "massively" faster than missing it (due to overheads in pip, mostly), it's unlikely to be a win without refactorings

hundreds-father-404

03/17/2020, 5:03 PM

By library, do you mean 3rd party distribution? Then using PEX_PATH to merge them all?

witty-crayon-22786

03/17/2020, 5:04 PM

python_library

hundreds-father-404

03/17/2020, 5:04 PM

Hm, I don’t follow. What precisely do you mean by “resolve” here?

witty-crayon-22786

03/17/2020, 5:04 PM

you would not be able to merge with pex_path, because the resolves would be different at each stage

witty-crayon-22786

03/17/2020, 5:04 PM

a pex/pip resolve for the transitive dependencies of the target

Open in Slack

Previous Next