Why can t `lint per file process` use `repository pex` direc Pants #development

Why can't `lint-per-file-process` use `repository....

bitter-ability-32190

01/04/2022, 9:33 PM

Why can't

lint-per-file-process

use

repository.pex

directly?

bitter-ability-32190

01/04/2022, 9:35 PM

Screenshot from 2022-01-04 15-34-32.png

hundreds-father-404

01/04/2022, 9:45 PM

You can use this option in 2.9 to skip that: https://www.pantsbuild.org/v2.9/docs/reference-python#section-run-against-entire-lockfile See "Fine-grained resolves without a performance hit" in https://blog.pantsbuild.org/introducing-pants-2-5/ for what's going on

👀 1

bitter-ability-32190

01/04/2022, 9:49 PM

Oh interesting. in the short term that's certainly faster. I wonder if the long-term which one will win 🤔

hundreds-father-404

01/04/2022, 9:50 PM

I think the other way around - with a cold cache, it's more work because you have to do the extraction step. But it means a warm cache has much less invalidation

bitter-ability-32190

01/04/2022, 9:52 PM

I meant in terms of weeks. So long as our lockfile never changes thats faster. But when it changes we pay the penalty. Of course if the work to resolve all the little requirements.pex is more than the work it takes to rebuild the big kahuna, it's still worth it

bitter-ability-32190

01/04/2022, 9:53 PM

Also, I wonder if

repository.pex

can be (and is) remotely cached between 2 disperate systems

hundreds-father-404

01/04/2022, 10:00 PM

I wonder if repository.pex can be (and is) remotely cached between 2 disperate systems

Remote caching works awesome currently in CI, where you have multiple distinct machines but all with the same PATH and setup etc It doesn't work as well with desktop usage because systems tend to diverge in ways that invalidate the cache. https://github.com/pantsbuild/pants/issues/12203 tracks how we might improve that

witty-crayon-22786

01/04/2022, 10:10 PM

note also though that there are some significant improvements to subsetting in

2.9.0rc2

: see https://github.com/pantsbuild/requirements-perf/issues/2

➕ 2

hundreds-father-404

01/04/2022, 10:12 PM

We should highlight that in release blog

witty-crayon-22786

01/04/2022, 10:14 PM

and there is another bit of low hanging fruit with subsetting: we’ve added support for reusing immutable inputs to sandboxes, and the

repository.pex

should definitely be one. https://github.com/pantsbuild/pants/issues/14070

➕ 2

bitter-ability-32190

01/04/2022, 10:29 PM

Should the path be an input to creating the repository pex?

witty-crayon-22786

01/04/2022, 10:30 PM

currently: yes unfortunately. many python artifact builds require e.g. a C compiler

➕ 1

bitter-ability-32190

01/04/2022, 10:30 PM

Ah right

witty-crayon-22786

01/04/2022, 10:31 PM

addressing https://github.com/pantsbuild/pants/issues/12203 without boiling the entire ocean will be a fine line to walk

bitter-ability-32190

01/04/2022, 10:33 PM

I think the measurement based approach is a good one

bitter-ability-32190

01/06/2022, 5:39 PM

I think

run-against-entire-lockfile

is possibly incorrectly implemented. That or

pylint

just sucks. The more files I throw at it, the longer it takes w.r.t. turning it off.

bitter-ability-32190

01/06/2022, 5:40 PM

All my `pylint`s are taking waaaya longer than they should

bitter-ability-32190

01/06/2022, 5:43 PM

It took 39 minutes to lint my repo, vs. 12.3 with it turned off. No caching

witty-crayon-22786

01/06/2022, 5:43 PM

i’m guessing that pylint has a fair amount of per-file overhead, as described in the doc that Eric linked

witty-crayon-22786

01/06/2022, 5:44 PM

oh, sorry. you mean 12.3 vs 39 with/without run-against-entire-lockfile ?

bitter-ability-32190

01/06/2022, 5:44 PM

Yes

bitter-ability-32190

01/06/2022, 5:45 PM

I'm still guessing it's pylint to blame. but that's just really unfortunate

witty-crayon-22786

01/06/2022, 5:45 PM

that’s maybe not surprising… using

run-against-entire-lockfile

means materializing all of your requirements for each lint process

bitter-ability-32190

01/06/2022, 5:45 PM

Ahhhhh that might be it

witty-crayon-22786

01/06/2022, 5:45 PM

and then if

pylint

is actually consuming all of them…?

hundreds-father-404

01/06/2022, 5:45 PM

Which we can fix via https://github.com/pantsbuild/pants/issues/14070 right?

bitter-ability-32190

01/06/2022, 5:46 PM

@hundreds-father-404 I was just thinking that 🙂

witty-crayon-22786

01/06/2022, 5:46 PM

@bitter-ability-32190: to confirm, you could use your flamegraphy tool to look at sandbox creation time: that workunit is called

setup_sandbox

witty-crayon-22786

01/06/2022, 5:46 PM

@hundreds-father-404: the former bit (sandbox creation time) would be improved by https://github.com/pantsbuild/pants/issues/14070 yes.

bitter-ability-32190

01/06/2022, 5:46 PM

I could, yes. I'd just need to dust it off and context switch 😂

witty-crayon-22786

01/06/2022, 5:46 PM

but time spent in pylint will not be.

witty-crayon-22786

01/06/2022, 5:48 PM

fwiw: i’m planning to start https://github.com/pantsbuild/pants/issues/13462 later this week, and i’m hopeful that it will be able to eliminate the choice between per-file linting and “all at once” linting

bitter-ability-32190

01/06/2022, 5:50 PM

For the curious, I assume that involves parsing the output of the tools. Any ideas on how to do that?

bitter-ability-32190

01/06/2022, 5:51 PM

For pylint/flake8, both are part of PyCQA so you might approach them to set some kind of "standard" for lint tools to output a standard format. And see if

mypy

will play too. Otherwise, I know

flake8

allows custom reporters, could implement one of those...

witty-crayon-22786

01/06/2022, 5:52 PM

i’m hoping to constrain what a

@rule

is able to consume when the feature is enabled… we’ll see how feasible that is.

👀 1

melodic-thailand-99227

01/06/2022, 8:48 PM

@witty-crayon-22786 here’s a live example -

witty-crayon-22786

01/06/2022, 9:26 PM

@melodic-thailand-99227: the

--pex-verbosity

setting (https://www.pantsbuild.org/docs/reference-pex#section-verbosity) can render what PEX is actually doing there… could try with

or so, and see if it highlights how time is being spent. but as mentioned, it should all just be time spent in pip.

witty-crayon-22786

01/06/2022, 9:30 PM

i’ve been thinking about how this relates to the upcoming native support for lockfiles, and it seems like it boils down to seeing whether producing the lockfile can do less than actually building/installing all of the wheels, and instead defer that until they’re actually used. because at that point, your steps would be: 1. change your requirements, and update your lockfile (this might take some time, but ideally would not install anything) 2. commit the new lockfile 3. consumers would not actually resolve the lockfile contents: only build the wheels necessary to consume subsets of it

💯 1

witty-crayon-22786

01/06/2022, 9:35 PM

…in the end, this also seems related to the discussion that’s been happening on https://github.com/pantsbuild/pex/issues/1423#issuecomment-1002296750 … commented there.

bitter-ability-32190

01/07/2022, 3:37 PM

(Commenting on the issue as well for vis) So it turns out there is a "hack" that can be used in the meantime. Setting the relevant parallelism options for each tool turns out to be a "poor-mans-batch" is faster than not setting it (obviously) but also faster than the

-per-file-caching

. E.g. If I wanted to run

fmt

with

isort

and

black

and I have a 64 core machine. Running with

--black-args='-W 64' --isort-args='--jobs=64'

is always the fastest option. Whereas with

lint

(which runs in parallel) you probably want to take N cores and divide by M checkers...

--black-args='-W 16' --isort-args='--jobs=16' --flake8-args='--jobs=16' --pylint-args='--jobs=16'

. Depending on how long fixing the issue might take, it might be worth finding/writing clever little plugins to set the num jobs smartly based on CPU count

bitter-ability-32190

01/07/2022, 3:40 PM

The issue has an estimate of ~ 2 weeks, so likely isn't worth the time spent making the little plugins, but interesting nonetheless.

witty-crayon-22786

01/07/2022, 4:58 PM

that is really useful data: thanks! https://github.com/pantsbuild/pants/issues/9964 suggests that we might be able calculate that value dynamically

witty-crayon-22786

01/07/2022, 4:59 PM

it doesn’t address the “too many arguments” aspect of https://github.com/pantsbuild/pants/issues/13462, but perhaps some combination of the two issues is what is called for.

bitter-ability-32190

01/07/2022, 5:04 PM

Edit: I misunderstood the issue linked

bitter-ability-32190

01/07/2022, 5:11 PM

There's several GitHub issues at play, and I'm wondering how they'll fit together. Let me know if you need a tester for whatever features or combo of features.

witty-crayon-22786

01/07/2022, 5:11 PM

thanks.

witty-crayon-22786

01/07/2022, 5:16 PM

i’ll write something up before the end of the day to summarize what i’m thinking, because i think that it is beginning to solidify.

🙌 1

bitter-ability-32190

01/07/2022, 5:18 PM

radical

Open in Slack

Previous Next