FYI <@U06A03HV1> <@U01D2SF5DU0> I just saw <Issue ...
# development
FYI @witty-crayon-22786 @chilly-magazine-21545 I just saw Issue 10864: Improve MyPy performance which talks about MyPy's persistent cache and challenges because of it. I'm volunteering some personal time to implement something similar for
issue link, which powers
. The overall approach isn't dissimilar from `mypy`'s initial implementation (per-module JSON stored in a cache dir. Look at byte-length+mtime to know whether to re-parse). Happy to take future requests to make it
👀 1
we’ve come a long way on deciding how much flexibility to allow for caches, so i expect that we can make something work there.
even without its caches, mypy is a lot faster than pylint though, so it hasn’t been as much of a priority
I currently have a very ugly WIP PR which handles the transformation from Python object to JSON: No code for the actual persisting logic, but that's easy enough to write (I have it offline) and it'll likely be in a follow-up PR
I suppose that's what happens when you have Guido-level visibility/development on a project 😂
the big question around mypy (and probably pylint) is whether the cache needs to be per-repository, or global. @enough-analyst-54434’s comment at the end of the thread about enabling the cache being global would make things easier for us.
love-it-or-hate-it has a harder job, as it tries to infer a shitton of info operating only on the AST.
AFAIK does actual importing so it has to infer much less
IIUC the cache key is absolute paths, so "global"
yea. the issue (which I had mostly forgotten: quite a thread) is that Pants runs things in sandboxes
so the absolute path would end up being a sandbox path, and would then need fixing.
John’s comment about switching to digests avoids that issue. but unclear how much work it would be upstream.
I'd be surprised if it became that much work, honestly.... but I've been surprised before 🙂
Do you have a doc page on Pant's sandboxing? I'm familiar with Bazel's but not Pants'
the actual implementation isn’t very well described, but the public API is:
inputs and outputs are specified using digests (with the same underlying datastructures as for Bazel, because Pants supports remote execution with the REAPI)
Are the paths inside the temp folder copied or linked? Could the links be resolved by the underlying app (`mypy`/`pylint`)?
Ah, yes. Digest would be a hard requirement then
i expect that mypy’s sqlite schema could be adapted pretty easily (behind a flag even)… and yea, digest keyed would probably be a good first place to go for pylint
one challenge that i haven’t thought through is how dependencies are accounted for in that cache: it’s possible that mypy is relying on timestamps to bump cache entries when the dependencies have changed (despite their direct content being identical). and that complicates things cc @enough-analyst-54434
I believe that is true
FWIW I chose path+mtime (and I suspect mypy did as well) is because that's what Python uses for pyc files