I know there's docs regarding running `pylint` &am...
# general
b
I know there's docs regarding running
pylint
& friends in N processes instead of one big one (and letting the tool subprocess), but can't seem to find it. Perhaps it should be linked on this page? https://www.pantsbuild.org/v2.8/docs/python-linters-and-formatters
OK so suggestion: The per-language pages for global goals (lint, fmt, etc...) links to the global docs as well? https://www.pantsbuild.org/docs/python-lint-goal links to https://www.pantsbuild.org/docs/reference-lint#advanced-options?
h
That could make sense! Altho warning with the help message for --per-file-caching that it will normally be a lot slower, unless you've found otherwise such as thanks to remote caching Instead, @proud-dentist-22844 worked around this problem by using [pylint].args to have it manage its own concurrency
👍 1
b
SO it isn't transparent from the UI. If I change a single file, is Pants going to pass the one file to
pylint
, or all the files for my spec? E.g. Running
./pants lint a/b/::
once, editing a file, and then running it again, I'd expect the second run to only run
pylint
on a single file. The logs read
Scheduling: Run Pylint on 22 files.
which makes me think that's not the case
h
For linting, we batch (unlike test) because we found there's usually too much overhead in launching a process per file. What you're describing is --per-file-caching We do want to make some performance improvements to sandbox creation that might change that calculus tho In favor of --per-file-caching
b
Well, in that batch, I'd expect it to be a batch of some files <= N, right? The batching makes sense, but sending the full batch doesn't (at least to me, outside looking in)
To put this another way
./pants lint ::
is painful in our monorepo, especially if I only edit one file. (I know about
--changed-since
, but ideally the cache can provide a lift here if people omit it)
w
https://github.com/pantsbuild/pants/issues/13462 is the best ticket for this probably.
b
I totally agree with the linked issue, if I need to run on N > M (CPU cores) files, then M processes, batch-requesting subsets of N makes the most sense. However my question is, if I only touch one file, why do I see output mentioning all the files? (E.g.
no issues found in 21 source files
)
w
b
Oh I see the disconnect now. When you run the linters with all the files, there's no way to know which individual files were good vs bad without parsing the output. Therefore it's currently all-or-nothing
1
I wonder if the tools either have a structured output mode, or would accept PRs that added that 🤔
Then you could batch for performance reasons, but not be forced into it for fine-grained caching