(Tangentially related to <https://github.com/pants...
# development
b
(Tangentially related to https://github.com/pantsbuild/pants/issues/13462) What would it look like to have Pants run tools (lint) on an N >1 input of files, while maintaining a N cache-entries? 🧵
This is probably the biggest heartburn I get from Pants. I lint my repo and get an all clear. Edit 1 file, and now I'm linting N files again. I could lint per-file, but the startup cost is steep (and we don't remote cache). In the dumbest sense, if linting with a tool passes for N files, pants could cache the result for those N files, separately. Changing one file only invalidates the one cache entry for that file. To avoid non-structured tool output, you could resort to N-file cache entry on failed lint. Eventually if tools had structured output, you could improve this.
At a high-level glance, what would it take to move Pants in this direction? I'm happy to contribute when possible, but assume this would require engine changes, so guidance would be nice.
c
Are you using
--changed-since
to reduce the scope of your lint?
b
Long-story-short, not yet 🙂 Longer-story, we have similar functionality in our in-house tools which has been implemented improperly, so there's distrust in validity unless we lint the world. I'm working on building trust, but want to limit the scope of changes. Moving to Pants is a biggun 😉 Also, IIRC you should
pylint
transitively since it traverses the import hierarchy. So there's another penalty even if I am using
--changed-since
😕
c
We lint the world in CI but on dev machines we have aliases like
lint
and
fmt
that use
--changed-since=main
. These run in about 1s - for reasons you mention there is a small chance something wouldn't get caught until CI but we haven't practically seen it yet.
👀 1
w
Also, IIRC you should 
pylint
 transitively since it traverses the import hierarchy. So there’s another penalty even if I am using 
--changed-since
Should
pylint
maybe be in
check
with
mypy
? Performance is a bit worse, and the fact that it is transitive makes it potentially a better fit there
the “cache a portion of a process” thing has been discussed, but it’s risky since it violates hermeticity. partitioning to reasonably sized batches has most of the benefits, without those risks
1
but unclear… we might still need to do it at some point.
b
I suppose a good first step to either option would be that structured output from tools 🤔
pylint
is funny because some it dances on the line. I suppose it depends on the semantics of
check
and
lint
really. (Only lightly going to point out a tool with "lint" in the name being in "check" 😉 )
A third, more complex option to the transitivity issue might be that tools mark themselves as transitive or not, and then have a value
changed-dependees=implicit
(or similar)
1
h
"cache a portion of a process" might be reasonable for linters because you'd just be caching a single bit, namely the fact that file passed lint. It gets harder when there are outputs and you have to split them up across the input files.
w
but yea,
./pants --loop --changed-since=HEAD check fmt lint
is my bread and butter
h
But I do think this is something we'll need to do, and give the user a knob to trade off hermeticity for performance
w
imo, we should prefer performance improvements that can be enabled by default because they are safe. but maybe.
https://github.com/pantsbuild/pants/issues/13462 is very much on the radar… unfortunately a few of us are busy polishing Go and getting the preview of JVM support out
b
I think that issue + Rust-embedding-Python and
pants
will likely... blow your pants off? Hmm 🤔
😂 2
🙄 2
h
FYI Pylint only needs direct deps, not transitive.
--changed-dependees=direct
b
🤔 I'll have to run a check next opportunity
h
Should work! The code only pulls in direct deps. If it needed transitive deps, the build would fail because the sandbox would be missing those files
b
Copy code
# a.py
A = NotImplemented
Copy code
# b.py
from .a import A
Copy code
# c.py
from .b import A

def foo():
    raise A
Copy code
$ pylint c.py
...
c.py:4: [E0702(raising-bad-type), foo] Raising NotImplementedType while only classes or instances are allowed
👀
f
some context: I’ve floated an idea in the past with some of the other maintainers to add an internal ability in the engine to generate “synthetic actions” from batched computation, i.e. run a batch `Process`but then generate a synthetic individual
Process
for each file in the batch and put those in the cache
(the trick being that the Remote Execution API model for build actions constrains the solution somewhat)
(since the Pants model is close in design to the REAPI)
b
(@hundreds-father-404 I'll move the
pylint
thing to a new thread)