Hello. Toolchain has been using V2 linters for two...
# development
h
Hello. Toolchain has been using V2 linters for two months now and, frankly, we’ve found the experience to be much worse than V1 due to the performance. We wrote up a proposal to allow sticking with the current caching scheme (which does have certain benefits) but to also allow using a different caching scheme for much better performance with colder caches, via a new option
--fmt-per-target-caching
and
--lint-per-target-caching
. Medium term, we wrote a proposal to replace that option with an implementation that has the benefits of both caching schemes I’d appreciate a look if you have a moment! https://docs.google.com/document/d/1Tdof6jx9aVaOGeIQeI9Gn-8x7nd6LfjnJ0QrbLFBbgc/edit#
w
Will look, but: are you using
--changed
flags in this usecase?
I don't think that per-target caching was intended to be a replacement for only operating on changed targets when you know about them.
h
We’re not using
--changed
. We want to be able to tell users to simply run
./pants fmt ::
. But even if we did, we expect per-target caching to not be worth it if 2-3+ targets have changed, depending on the tool and the size of the codebase. Linters are unique compared to Pytest in that the cost of running one additional file with a linter is incredibly low, whereas the overhead of starting the linter fresh is quite high -- There are downsides to losing per-target caching, which is why the option
--per-target-caching
is only temporary and we explain a medium-term proposal to get the best of both schemes, i.e. have fine-grained caching while avoiding the overhead of starting the tool multiple times
w
How much of that overhead is pex overhead?
Because the pex overhead is quite significant right now...
And addressing that addresses a lot of things
h
See the design doc. That benchmark was running the tools directly, without Pants or Pex, and showed 5-30x worse performance, depending on the tool (on cold caches).
w
Thanks, will look
h
The reason we proposed for the short-term to have it be an option is so that Twitter or anyone who wants to keep using per-target caching can continue to do so. Meanwhile, Toolchain can use the caching scheme that gets us better performance In the medium term, everyone will benefit from the combined caching scheme proposal
w
So, is the way to interpret the "per file" column: "time to run for X files" sequentially?
So 10 files -> 3 seconds for bandit, for example?
h
Yes, so it runs
black f1.py; black f2.py; black f3.py
, rather than
black f1.py f2.py f3.py
w
Ok, so then there are two other variables here
1. average target size
2. parallelism
That significantly changes the break even point, and might make --changed viable.
Because we've done the "invoke in a batch and then try to tease everything back apart" thing before, and it's not pretty
h
1. average target size
Yes, 1-1-1 proves to be a good pattern with linters
w
And a fourth foil is that you cannot enforce dependencies properly with batches
(Doesn't matter for pure formatters, matters for things that need deps)
h
(Doesn’t matter for pure formatters, matters for things that need deps)
We do have one example of this now:
pylint
, which needs direct dependencies (but not transitive dependencies)
Because we’ve done the “invoke in a batch and then try to tease everything back apart” thing before, and it’s not pretty
Is there somewhere I can read up on this? On the other side, John mentioned Python Ants used to run things per-target and then added a mechanism to try batching things for these similar performance concerns. https://docs.google.com/document/d/1Tdof6jx9aVaOGeIQeI9Gn-8x7nd6LfjnJ0QrbLFBbgc/edit?disco=AAAAGOanEo8
w
Zinc and Scala
Benjy knows
👍 1
h
Yeah, so splitting results was awful in that case, because we had to write complex to take the result of running zinc, which includes .class files and analysis data, and figure out which bit belonged to each individual target. In the case of the analysis data that was very hard to get right.
👍 1
But linters are different, in that we only need to cache the single bit "this file has no lint"
👍 1
In this specific case, it's easy.
So if the file does have lint, we don't split
👍 1
So for example, in a clean build,
./pants fmt ::
will run on 1000 files in one pass, and say 10 of them fail. We fix those and run
./pants fmt ::
, which again runs on 1000 files, but they all pass. Now we cache the fact "this file is lint free" for each of the 1000 files individually. Now we pull a change that touched 8 files.
./pants fmt ::
will consult the cache for 1000 files, notice that 992 of them have no lint, and run only on the 8 remaining, in one pass.
👍 1
We can do even better if a specific linter happens to make it easy to figure out which files passed vs which failed (say because it emits this information to a JSON file). Then that second
./pants fmt ::
run could only operate on the 10 files, not the entire 1000.
But that's a nice-to-have enhancement
The key different here is that there is no complicated splitting logic when files pass lint, the information is just a bit.
👍 1
We could not do this easily for, say, tests, or compiles, or anything that has output.
w
Ah, yea: John's note refers to the same thing