FYI <@U06A03HV1> <@U01D2SF5DU0> I just saw <Issue ...
# development
b
FYI @witty-crayon-22786 @chilly-magazine-21545 I just saw Issue 10864: Improve MyPy performance which talks about MyPy's persistent cache and challenges because of it. I'm volunteering some personal time to implement something similar for
astroid
issue link, which powers
pylint
. The overall approach isn't dissimilar from `mypy`'s initial implementation (per-module JSON stored in a cache dir. Look at byte-length+mtime to know whether to re-parse). Happy to take future requests to make it
pants
-compatible
👀 1
w
nice!
we’ve come a long way on deciding how much flexibility to allow for caches, so i expect that we can make something work there.
even without its caches, mypy is a lot faster than pylint though, so it hasn’t been as much of a priority
b
I currently have a very ugly WIP PR which handles the transformation from Python object to JSON: https://github.com/PyCQA/astroid/pull/1194 No code for the actual persisting logic, but that's easy enough to write (I have it offline) and it'll likely be in a follow-up PR
I suppose that's what happens when you have Guido-level visibility/development on a project 😂
w
😃
the big question around mypy (and probably pylint) is whether the cache needs to be per-repository, or global. @enough-analyst-54434’s comment at the end of the thread about enabling the cache being global would make things easier for us.
b
Also FWIW
astroid
love-it-or-hate-it has a harder job, as it tries to infer a shitton of info operating only on the AST.
mypy
AFAIK does actual importing so it has to infer much less
IIUC the cache key is absolute paths, so "global"
w
yea. the issue (which I had mostly forgotten: quite a thread) is that Pants runs things in sandboxes
so the absolute path would end up being a sandbox path, and would then need fixing.
John’s comment about switching to digests avoids that issue. but unclear how much work it would be upstream.
b
I'd be surprised if it became that much work, honestly.... but I've been surprised before 🙂
Do you have a doc page on Pant's sandboxing? I'm familiar with Bazel's but not Pants'
w
the actual implementation isn’t very well described, but the public API is: https://www.pantsbuild.org/docs/rules-api-process
inputs and outputs are specified using digests (with the same underlying datastructures as for Bazel, because Pants supports remote execution with the REAPI)
b
Are the paths inside the temp folder copied or linked? Could the links be resolved by the underlying app (`mypy`/`pylint`)?
w
copied
b
Ah, yes. Digest would be a hard requirement then
w
i expect that mypy’s sqlite schema could be adapted pretty easily (behind a flag even)… and yea, digest keyed would probably be a good first place to go for pylint
one challenge that i haven’t thought through is how dependencies are accounted for in that cache: it’s possible that mypy is relying on timestamps to bump cache entries when the dependencies have changed (despite their direct content being identical). and that complicates things cc @enough-analyst-54434
b
I believe that is true
FWIW I chose path+mtime (and I suspect mypy did as well) is because that's what Python uses for pyc files