Why can't `lint-per-file-process` use `repository....
# development
b
Why can't
lint-per-file-process
use
repository.pex
directly?
Screenshot from 2022-01-04 15-34-32.png
h
You can use this option in 2.9 to skip that: https://www.pantsbuild.org/v2.9/docs/reference-python#section-run-against-entire-lockfile See "Fine-grained resolves without a performance hit" in https://blog.pantsbuild.org/introducing-pants-2-5/ for what's going on
👀 1
b
Oh interesting. in the short term that's certainly faster. I wonder if the long-term which one will win 🤔
h
I think the other way around - with a cold cache, it's more work because you have to do the extraction step. But it means a warm cache has much less invalidation
b
I meant in terms of weeks. So long as our lockfile never changes thats faster. But when it changes we pay the penalty. Of course if the work to resolve all the little requirements.pex is more than the work it takes to rebuild the big kahuna, it's still worth it
Also, I wonder if
repository.pex
can be (and is) remotely cached between 2 disperate systems
h
I wonder if repository.pex can be (and is) remotely cached between 2 disperate systems
Remote caching works awesome currently in CI, where you have multiple distinct machines but all with the same PATH and setup etc It doesn't work as well with desktop usage because systems tend to diverge in ways that invalidate the cache. https://github.com/pantsbuild/pants/issues/12203 tracks how we might improve that
w
note also though that there are some significant improvements to subsetting in
2.9.0rc2
: see https://github.com/pantsbuild/requirements-perf/issues/2
2
h
We should highlight that in release blog
w
and there is another bit of low hanging fruit with subsetting: we’ve added support for reusing immutable inputs to sandboxes, and the
repository.pex
should definitely be one. https://github.com/pantsbuild/pants/issues/14070
2
b
Should the path be an input to creating the repository pex?
w
currently: yes unfortunately. many python artifact builds require e.g. a C compiler
1
b
Ah right
w
addressing https://github.com/pantsbuild/pants/issues/12203 without boiling the entire ocean will be a fine line to walk
b
I think the measurement based approach is a good one
I think
run-against-entire-lockfile
is possibly incorrectly implemented. That or
pylint
just sucks. The more files I throw at it, the longer it takes w.r.t. turning it off.
All my `pylint`s are taking waaaya longer than they should
It took 39 minutes to lint my repo, vs. 12.3 with it turned off. No caching
w
i’m guessing that pylint has a fair amount of per-file overhead, as described in the doc that Eric linked
oh, sorry. you mean 12.3 vs 39 with/without run-against-entire-lockfile ?
b
Yes
I'm still guessing it's pylint to blame. but that's just really unfortunate
w
that’s maybe not surprising… using
run-against-entire-lockfile
means materializing all of your requirements for each lint process
b
Ahhhhh that might be it
w
and then if
pylint
is actually consuming all of them…?
h
b
@hundreds-father-404 I was just thinking that 🙂
w
@bitter-ability-32190: to confirm, you could use your flamegraphy tool to look at sandbox creation time: that workunit is called
setup_sandbox
@hundreds-father-404: the former bit (sandbox creation time) would be improved by https://github.com/pantsbuild/pants/issues/14070 yes.
b
I could, yes. I'd just need to dust it off and context switch 😂
w
but time spent in pylint will not be.
fwiw: i’m planning to start https://github.com/pantsbuild/pants/issues/13462 later this week, and i’m hopeful that it will be able to eliminate the choice between per-file linting and “all at once” linting
b
For the curious, I assume that involves parsing the output of the tools. Any ideas on how to do that?
For pylint/flake8, both are part of PyCQA so you might approach them to set some kind of "standard" for lint tools to output a standard format. And see if
mypy
will play too. Otherwise, I know
flake8
allows custom reporters, could implement one of those...
w
i’m hoping to constrain what a
@rule
is able to consume when the feature is enabled… we’ll see how feasible that is.
👀 1
m
@witty-crayon-22786 here’s a live example -
w
@melodic-thailand-99227: the
--pex-verbosity
setting (https://www.pantsbuild.org/docs/reference-pex#section-verbosity) can render what PEX is actually doing there… could try with
3
or so, and see if it highlights how time is being spent. but as mentioned, it should all just be time spent in pip.
i’ve been thinking about how this relates to the upcoming native support for lockfiles, and it seems like it boils down to seeing whether producing the lockfile can do less than actually building/installing all of the wheels, and instead defer that until they’re actually used. because at that point, your steps would be: 1. change your requirements, and update your lockfile (this might take some time, but ideally would not install anything) 2. commit the new lockfile 3. consumers would not actually resolve the lockfile contents: only build the wheels necessary to consume subsets of it
💯 1
…in the end, this also seems related to the discussion that’s been happening on https://github.com/pantsbuild/pex/issues/1423#issuecomment-1002296750 … commented there.
b
(Commenting on the issue as well for vis) So it turns out there is a "hack" that can be used in the meantime. Setting the relevant parallelism options for each tool turns out to be a "poor-mans-batch" is faster than not setting it (obviously) but also faster than the
-per-file-caching
. E.g. If I wanted to run
fmt
with
isort
and
black
and I have a 64 core machine. Running with
--black-args='-W 64' --isort-args='--jobs=64'
is always the fastest option. Whereas with
lint
(which runs in parallel) you probably want to take N cores and divide by M checkers...
--black-args='-W 16' --isort-args='--jobs=16' --flake8-args='--jobs=16' --pylint-args='--jobs=16'
. Depending on how long fixing the issue might take, it might be worth finding/writing clever little plugins to set the num jobs smartly based on CPU count
The issue has an estimate of ~ 2 weeks, so likely isn't worth the time spent making the little plugins, but interesting nonetheless.
w
that is really useful data: thanks! https://github.com/pantsbuild/pants/issues/9964 suggests that we might be able calculate that value dynamically
it doesn’t address the “too many arguments” aspect of https://github.com/pantsbuild/pants/issues/13462, but perhaps some combination of the two issues is what is called for.
b
Edit: I misunderstood the issue linked
There's several GitHub issues at play, and I'm wondering how they'll fit together. Let me know if you need a tester for whatever features or combo of features.
w
thanks.
i’ll write something up before the end of the day to summarize what i’m thinking, because i think that it is beginning to solidify.
🙌 1
b
radical