Would it makes sense to extend `per_file_caching` ...
# plugins
b
Would it makes sense to extend
per_file_caching
option to each linter itself? I'm thinking some linters are faster than others (
pylint
traverses imports and does crazy inference, whereas
black
does very little work). The tipping point here would be something like
pylint
is faster to read from the cache than run, but for
black
the difference might be in the noise
h
Definitely reasonable!! It would override the global default It will require changes to the Plugin API, but that may be worth it -- Would you have a chance to benchmark this idea to see if it's actually worth doing, and then file a feature rquwst if so? Maybe use something like hyperfine to have Pants run Black in isolation, followed by Pylint in isolation. See https://www.pantsbuild.org/v2.8/docs/contributions-debugging for how we benchmark pants
🙌 1
b
Sure thing!
I'm almost afraid to run
--lint-per-file-caching
on
pylint
in our repo 😂
I'll report back... next year
h
Haha I hear you there, I recommend running over a subset with
./pants lint dir::
or
dir:
b
OK so I'll need to check my understanding on this one. Obliviously
isort
(we don't use
black
gets less performant), but
pylint
did as well, by a long shot...
pylint
went from ~15s one-process to ~72s per-file. Does Pants still spin up the subprocess if the result should be cached?
h
Does Pants still spin up the subprocess if the result should be cached?
This is where you will want to test both cold cache and warm cache, using the techniques from the Pants Debugging page I sent Cold cache is always going to be slower with per-process. But the question is how much worse, and how much faster is warm cache where it's worth it?
To simulate the warm cache, you'd want to do something like change 1 file out of 50. Still use pantsd and caching Whereas cold cache should disable both
b
So to start I am only testing with warm cache and saw the above results 🤔
No files changed, so ideally the warmest cache
I guess with the cache I'd expect after an initial run with either option, the run is blazingly fast with no edits. 🤔
h
It should be! It's not caching the result? If so, that's a bug
b
OK so one thing I noticed was that that stdout was being blasted by
11:51:08.36 [INFO] Completed: Lint using Pylint - Pylint succeeded.
so I'm going to try again with
-lwarn
. I also noticed I was running with
./pants_from_sources
. Perhaps that does something different than just
./pants
No, the timing was right:
Copy code
joshuacannon@CEPHANDRIUS:~/work/techlabs$ time ./pants -lwarn --isort-skip --yapf-skip --lint-per-file-caching lint ::
...
real    1m14.204s
user    0m0.398s
sys     0m0.058s
h
./pants_from_sources
doesn't use pantsd by default
😅 1
b
Something must be funky with my setup 🤔
Although, I'd hope that even without the daemon, the disk-cache would make this fastm but perhaps the 70-ish seconds is mostly reading the cache?
h
It definitely should not be. To confirm,
./pants --no-pantsd lint path/to/f.py
is not caching from disk when you run it three times in a row? Regardless of
--per-file-caching
? I don't reproduce, so trying to figure out what's up
b
It is:
Copy code
12:26:59.55 [INFO] Counters:
  local_cache_read_errors: 0
  local_cache_requests: 36
  local_cache_requests_cached: 27
  local_cache_requests_uncached: 9
  local_cache_total_time_saved_ms: 145684
  local_cache_write_errors: 0
  local_execution_requests: 9
  local_process_total_time_run_ms: 952
I guess if there was a good way to see where the time was being spent by
pants
, this'd be a lot easier to reason about 😕
(NOTE: That info is from a run on a subset of our monorepo)
(I'm about to jump off for the day, we can pick this up tomorrow)
👋 1