also it looks like we don t set the `digest hint` in `PathGl Pants #development

also, it looks like we don't set the `digest_hint`...

aloof-angle-91616

09/27/2020, 2:52 PM

also, it looks like we don't set the

digest_hint

PathGlobsAndRoot

anywhere. were we doing that manually in the parallel

RscCompile

task before? i'm looking to see whether we could support an optimization to avoid snapshotting process execution output files/directories that we already know about as per this comment on the mutable caches doc (which i didn't know was already implemented -- this is super awesome): https://docs.google.com/document/d/1n_MVVGjrkTKTPKHqRPlyfFzQyx2QioclMG_Q3DMUgYk/edit?disco=AAAAIva7gMw

❤️ 1

aloof-angle-91616

09/27/2020, 3:37 PM

made into https://github.com/pantsbuild/pants/issues/10870

hundreds-father-404

09/27/2020, 4:01 PM

We don’t use PathGlobsAndRoot anymore in production. We realized we do need to expose it through the rules API though. I think we would keep the digest hint

aloof-angle-91616

09/27/2020, 4:01 PM

yes! i am about to comment on stu's issue right now: https://github.com/pantsbuild/pants/issues/10842

aloof-angle-91616

09/27/2020, 4:04 PM

https://github.com/pantsbuild/pants/issues/10842#issuecomment-699653890

👍 1

aloof-angle-91616

09/27/2020, 4:05 PM

stu's idea was better

aloof-angle-91616

09/27/2020, 4:11 PM

the digest hint is a good idea for append-only caches, i think polling or

notify

works for caches like mypy's

aloof-angle-91616

09/27/2020, 4:13 PM

and also i really like how there are possibly now two workstreams, one for parenting the mypy daemon locally, another that could make it remotable if we poll the cache dir

💯 1

aloof-angle-91616

09/27/2020, 4:20 PM

and i think parenting the mypy daemon is likely to be more important for performance (and therefore higher priority), but i think that the

digest_hint

files can make pex invocations remotable while retaining the cache dir, so i think both have immediate use cases

hundreds-father-404

09/27/2020, 4:24 PM

I’m not sure how the daemon works, if it requires the cache to exist and be used. There is no way to turn off writing to the cache in MyPy. (I looked to save unnecessary work, since we don’t preserve it anyways)

aloof-angle-91616

09/27/2020, 4:27 PM

i think that we could have a global cache dir for mypy the way we do for other process executions right now. i think that the daemon part would need to be a whole thing to implement (but would likely share code or ideas from the nailgun process execution code, which is currently unused), but if we could fit it into a normal

Process

execution (maybe a wrapper struct, or a separate field that says how to create the daemon), we could make it use the same kind of cache with a symlink

aloof-angle-91616

09/27/2020, 4:30 PM

since we created a doc for the mutable caches, i think it might make sense to create another doc describing how we might implement a daemonized process invocation? i might do that -- @witty-crayon-22786 let me know if there is already such a doc

aloof-angle-91616

09/27/2020, 5:55 PM

made a doc with an idea about a possible API for persistent workers in pants (right now, the mypy daemon seems most immediately useful). it doesn't try to specify how we would communicate with the daemon -- instead, the

client

process is expected to handle that part. it slightly modifies the API for nailgunnable processes -- all of that is freely bikeshedable. lmk if this was already done: https://docs.google.com/document/d/1hSspjRLGO05-tB16NevvUW87rqIKHjUZJ4xv2YfCL1c/edit?usp=sharing

Persistent Workers for Pants

witty-crayon-22786

09/27/2020, 6:06 PM

good morning

aloof-angle-91616

09/27/2020, 6:07 PM

good morning!

witty-crayon-22786

09/27/2020, 6:10 PM

reading things

aloof-angle-91616

09/27/2020, 6:11 PM

there were lots of things! i tried to edit all the github issues down to make them clearer. i recognize there's a lot, sorry about that

aloof-angle-91616

09/27/2020, 6:29 PM

you'll see a github ping from me in a sec, it's just writing down your response on the mutable caches doc into the issue i created

aloof-angle-91616

09/27/2020, 9:58 PM

thanks a ton stu that multiplied my efforts tenfold

witty-crayon-22786

09/27/2020, 10:03 PM

Sure thing. I think the daemonized process API you've suggested looks good, and it's worth removing the parsing magic if we're going to use the API for more languages (or could moving the parsing to the python side to construct the more specific type).

🙌 1

aloof-angle-91616

09/27/2020, 10:04 PM

ok, that makes me more comfortable

witty-crayon-22786

09/27/2020, 10:04 PM

Would just caution that the "where is the working directory" and "where are the files" and "what is stable across runs" bits are what tripped us up before

👍 1

aloof-angle-91616

09/27/2020, 10:05 PM

yes, that is what caused me to say "oopsie" and backtrack

aloof-angle-91616

09/27/2020, 10:05 PM

thanks

witty-crayon-22786

09/27/2020, 10:06 PM

Zinc and mypy both assume(d) stable paths, and I don't know if the Bazel worker API requires a form of sandboxing that would leave files "in the working copy"

✍️ 1

aloof-angle-91616

09/27/2020, 10:06 PM

i'm interested in the FUSE part even separate from any daemons because i think it could just make everything ridiculously fast and i think i've looked at the

brfs

crate literally once so that seems like it might be a generally useful workstream

aloof-angle-91616

09/27/2020, 10:07 PM

i mentioned on the ticket i think path rewriting might also be something we could make ~generic across tools (just by regex stuff mostly), so i don't think that's too unreasonable either and would be less upfront work (i really liked the idea of making mypy recursive)

aloof-angle-91616

09/27/2020, 10:08 PM

i think those both probably touch the same general concepts too so will consider both

witty-crayon-22786

09/27/2020, 10:10 PM

Yea. I wish that it were easier to experiment to see how fast/slow recursive mypy would be. But I don't know if there is a way to run the experiment much more easily than "actually doing all the path rewriting". Maybe by doing it only for hardcoded paths? Unknown.

aloof-angle-91616

09/27/2020, 10:13 PM

i'll take a look at the files in the mypy cache and see if it seems reasonable. my hope is that if they're within the working dir we can just scan for the string of the cwd relatively unambiguously. my hope

aloof-angle-91616

09/27/2020, 10:20 PM

hmmmmmmmmmmm there appears to be exactly one absolute path (or path at all) in the mypy cache output, and it's located the single top-level key

path

in every single json file (i believe there are only json files). this could be rewritten with some json parser although i suspect more quickly with regex

aloof-angle-91616

09/27/2020, 10:20 PM

e.g.

Copy code

<.mypy_cache/3.5/uuid.meta.json jq '.' | g '/Users'
(standard input):62:  "path": "/Users/dmcclanahan/tools/pex/.tox/typecheck/lib/python3.8/site-packages/mypy/typeshed/stdlib/2and3/uuid.pyi",

aloof-angle-91616

09/27/2020, 10:21 PM

i would need to think more about the pipeline that's needed here but this seems like a layup if anything. will delve more

aloof-angle-91616

09/27/2020, 10:22 PM

and there are only json files (and one gitignore):

Copy code

> find .mypy_cache -type f | sed -re 's#.*(\.[^\.]+)#\1#g' | sort -u
.gitignore
.json

aloof-angle-91616

09/27/2020, 10:24 PM

it actually appears that all of the paths are relative except the ones from the stdlib

aloof-angle-91616

09/27/2020, 10:24 PM

Copy code

> find .mypy_cache -type f | parallel "echo -n '{}:' && jq -r '.path' <{}" | head -n3
.mypy_cache/3.5/test_resolver.meta.json:tests/test_resolver.py
.mypy_cache/3.5/atexit.meta.json:/Users/dmcclanahan/tools/pex/.tox/typecheck/lib/python3.8/site-packages/mypy/typeshed/stdlib/3/atexit.pyi
.mypy_cache/3.5/pex/testing.data.json:pex/testing.py

aloof-angle-91616

09/27/2020, 10:25 PM

i'll dump this in the ticket

aloof-angle-91616

09/27/2020, 10:33 PM

https://github.com/pantsbuild/pants/issues/10864#issuecomment-699696539

aloof-angle-91616

09/27/2020, 10:36 PM

i think the absolute paths can be turned into relative ones if we materialize typeshed types into the process execution dir. there's another symlink optimization we might consider for that (noted in the ticket)

witty-crayon-22786

09/27/2020, 10:41 PM

Interesting. Yea, dumping some info about that would be good. I thought I had put more info in the doc, but apparently not

witty-crayon-22786

09/27/2020, 10:42 PM

There are timestamps in there as well, iirc

aloof-angle-91616

09/27/2020, 10:42 PM

ah!!!!!

aloof-angle-91616

09/27/2020, 10:42 PM

thank you

aloof-angle-91616

09/27/2020, 10:43 PM

yes, that's correct. the

mtime

key in the

*.meta.json

. will add that

aloof-angle-91616

09/27/2020, 10:43 PM

there appear to be some duplicated fields as well (e.g.

data_mtime

), oh bother

aloof-angle-91616

09/27/2020, 10:44 PM

and there's a

platform

, which is still fine i think

aloof-angle-91616

09/27/2020, 10:52 PM

posted a gist and a rundown of the unstable fields: https://github.com/pantsbuild/pants/issues/10864#issuecomment-699698439

aloof-angle-91616

09/27/2020, 10:52 PM

i think someone should be able to pick it up with that info

aloof-angle-91616

09/27/2020, 11:01 PM

wrote a comment, i think the recursive method will work with the rewriting scheme after each run and there's enough info to just do that

aloof-angle-91616

09/27/2020, 11:01 PM

going to step off for a bit before trying to implement it so @hundreds-father-404 can take a look at it

❤️ 1

aloof-angle-91616

09/27/2020, 11:02 PM

and will look into

brfs

Open in Slack

Previous Next