FYI < witty crayon 22786> Eric s <https github com pantsbuil Pants #development

FYI <@U06A03HV1> Eric's <https://github.com/pantsb...

bitter-ability-32190

09/06/2022, 3:50 PM

FYI @witty-crayon-22786 Eric's https://github.com/pantsbuild/pants/pull/16761 is prep-work for the lint change (https://github.com/pantsbuild/pants/pull/16735) because: • Pylint receives all field sets and then needs to partition (by ICs/resolve) • The ICs/resolve metadata is attached to each partition •

lint

then batches the partitions and hands those batches with their metadata back to

pylint

for running • Therefore the metadata must be valid to each batch in a partition, because the metadata isn't batched So we need to make the partition metadata be performantly valid for all possible batches of the partition's FieldSets. So, that'd the context behind the change, in case that informs your review thoughts. Eventually the pylint runner gets a bag of field_sets and static metadata applying to that-bag-or-bigger

witty-crayon-22786

09/06/2022, 4:37 PM

yea, comment still applies.

lint

would want to partition

CoarsenedTargets

then, probably.

👍 1

bitter-ability-32190

09/06/2022, 4:37 PM

Thanks for thinking through it 👍👍 much appreciated

witty-crayon-22786

09/06/2022, 4:38 PM

at least assuming that

pylint

and

mypy

both need transitive deps, rather than just one level of deps, for example

bitter-ability-32190

09/06/2022, 4:39 PM

... wait lol just double-read that. Can you elaborate? Can we not partition by ICs/resolve, then when running pylint request CoarsenedTargets on the field set batch? Or is that what you're suggesting?

witty-crayon-22786

09/06/2022, 4:39 PM

Can we not partition by ICs/resolve, then when running pylint request CoarsenedTargets on the field set batch?

no: as mentioned in my comment, that would trigger #11270: https://github.com/pantsbuild/pants/pull/16761#discussion_r962063128

witty-crayon-22786

09/06/2022, 4:40 PM

you want to compute CTs once, and then re-use them, rather than computing them in each caller (ditto not computing TTs in a bunch of spots)

witty-crayon-22786

09/06/2022, 4:41 PM

if we fix #11270, it might be another story… but that’s lower in my queue than the docker/remote-exec stuff right now

witty-crayon-22786

09/06/2022, 4:44 PM

no: as mentioned in my comment, that would trigger #11270

hm. so, i suppose i didn’t mention in there that #11270 applies to both CT and TT. the thing that #11270 is about is avoiding making lots of requests for CTs/TTs (i.e. a request per root or partition), if possible.

witty-crayon-22786

09/06/2022, 4:45 PM

@bitter-ability-32190: anyway: happy to help unblock https://github.com/pantsbuild/pants/pull/16735 , but i don’t actually understand the blocker right now

bitter-ability-32190

09/06/2022, 4:49 PM

I guess what we could do is attach the CT/TT for a partition with the partition, and in the "runner" construct the sandbox using a possibly-over-populated sandbox for the partition's CT/TT (even though we're linting a subset). And the reason we'd do it is because materializing a potentially-larger sandbox is cheaper than calculating CT/TT in each "runner". Do I have that right?

witty-crayon-22786

09/06/2022, 4:50 PM

I guess what we could do is attach the CT/TT for a partition with the partition, and in the “runner” construct the sandbox using a possibly-over-populated sandbox for the partition’s CT/TT (even though we’re linting a subset).

it’s more that what you partition is CoarsenedTargets, not Targets.

witty-crayon-22786

09/06/2022, 4:51 PM

so there shouldn’t be any over-large sandboxes

bitter-ability-32190

09/06/2022, 4:53 PM

Well the result from the per-tool partitioning is then split by

lint.py

into "batches" which are subsets. So either we get the CT/TT for the "batch" in the runner, or we've passed a hint along from the partitioner (which would be maybe over-specified for the batch)

witty-crayon-22786

09/06/2022, 4:54 PM

there might be a pluralization issue here i think: if you have a list of CTs (

CoarsenedTargets

), then you can partition it into one or more lists of CTs

witty-crayon-22786

09/06/2022, 4:54 PM

or batch, or whatever word

witty-crayon-22786

09/06/2022, 4:55 PM

i.e.

Copy code

def partition(cts: CoarsenedTargets) -> list[CoarsenedTargets]:
    ..

bitter-ability-32190

09/06/2022, 4:58 PM

Example: •

lint.py

finds 26 field sets:

A..Z

•

lint.py

requests partitions from

pylint

•

pylint

partitions the field sets by some criteria and attaches some metadata. ◦ Let's say its by vowel/consonant ◦ So

pylint

returns back 2 partitions:

[A,E,I,O,U]+"vowel"

and

[<consonant>]+"consonant"

•

lint.py

takes that and batches it for cache perf. ◦ let's say batch size is 5 ◦

lint.py

gives the

pylint

runner 6 _batches_: ▪︎

[A,E,I,O,U]+"vowel"

+ 5 batches of

[<consonant subset>]+"consonant"

witty-crayon-22786

09/06/2022, 4:59 PM

•
lint.py
requests partitions from
pylint

•
pylint
partitions the field sets by some criteria and attaches some metadata.

don’t partition field sets: partition CTs

witty-crayon-22786

09/06/2022, 5:00 PM

and yea, this is annoying. fixing 11270 would make this less/un-necessary.

witty-crayon-22786

09/06/2022, 5:01 PM

if you have time to dive in on that one i can try to sketch out more of the design. it’s going to involve some munging in

<http://graph.rs|graph.rs>

though

bitter-ability-32190

09/06/2022, 5:01 PM

don’t partition field sets: partition CTs

pylint

? or in

lint

? I think I'm still off one stepping stone 🙂

witty-crayon-22786

09/06/2022, 5:03 PM

both. to avoid 11270 you’d want to preserve the CTs all the way through

bitter-ability-32190

09/06/2022, 5:06 PM

I don't think

lint.py

should be partitioning by `CT`s, no? Thats wasteful for non-dependent linters

➕ 1

witty-crayon-22786

09/06/2022, 5:07 PM

mm, yea.

pylint

being in

check

would make this easier, since all of those implementations require transitive deps.

bitter-ability-32190

09/06/2022, 5:08 PM

That ship sailed 😛

witty-crayon-22786

09/06/2022, 5:09 PM

think of it this way: you want to make as few calls to create

TTs

CTs

as you can. if you can’t create them at the very beginning of the process, then create them as early as you can

witty-crayon-22786

09/06/2022, 5:10 PM

but it’s just about performance (avoiding 11270). it’s not less correct.

bitter-ability-32190

09/06/2022, 5:10 PM

Yeah, then my prior comment stands:https://pantsbuild.slack.com/archives/C0D7TNJHL/p1662482962976039?thread_ts=1662479424.149289&cid=C0D7TNJHL

bitter-ability-32190

09/06/2022, 5:11 PM

We'll be constructing the sandbox for each batch (subset of partition) using the full partitions TTs/CTs since we assume materializing that is cheaper perf-wise than re-computing

witty-crayon-22786

09/06/2022, 5:11 PM

possibly. pylint could probably fix/subset the CTs it receives with the batch

witty-crayon-22786

09/06/2022, 5:11 PM

but yea

witty-crayon-22786

09/06/2022, 5:12 PM

i.e.

subset = [ct for ct in cts if any(m in $mybatch for m in ct.members)]

bitter-ability-32190

09/06/2022, 5:12 PM

booyah

bitter-ability-32190

09/06/2022, 5:20 PM

So the takeaway here is: • In Eric's https://github.com/pantsbuild/pants/pull/16761 the Partition object needs to hold the CTs • In my PR (post-Eric's change) we attach the CTs as metadata to the partition, then in the runnerr use-and-subset it

witty-crayon-22786

09/06/2022, 5:21 PM

sounds right

bitter-ability-32190

09/06/2022, 5:22 PM

🙏 thank you

bitter-ability-32190

09/06/2022, 8:31 PM

Ah I see it now. We can make one call to

CoarsenedTargets

for all field sets, then subset that as necessary!

witty-crayon-22786

09/06/2022, 8:34 PM

yea

bitter-ability-32190

09/06/2022, 8:41 PM

Another piece of this I maybe should've brought up earlier is that the core

lint

code doesn't actually care that the partitions/batches are of field sets after the initial partitioning. The only reason it matters is the dataclass attribute name. I'm hoping to keep it that way, for sanity 🙂 (The alternative being to split a partition of CTs into batches of CTs)

bitter-ability-32190

09/06/2022, 8:59 PM

I bumped Eric's PR 🙂 https://github.com/pantsbuild/pants/pull/16761

witty-crayon-22786

09/06/2022, 10:33 PM

ah, didn’t realize you had also editted it

bitter-ability-32190

09/07/2022, 1:28 AM

Oh I posted the comment just after I pushed ,🙂

bitter-ability-32190

09/07/2022, 4:00 PM

Oh hmmmmmmmmm, one issue with this I think is that if any target in the closure of the partition changes, it'll invalidate all batches. As the metadata attached is for the entire partition. I think I'm gonna change the PR to have batches be type-agnostic, then we can batch the

CorsenedTargets

. Bueno?

hundreds-father-404

09/07/2022, 4:03 PM

Oh hmmmmmmmmm, one issue with this I think is that if any target in the closure of the partition changes, it'll invalidate all batches. As the metadata attached is for the entire partition.

that sounds right

I think I'm gonna change the PR to have batches be type-agnostic, then we can batch the CorsenedTargets.

vs?

bitter-ability-32190

09/07/2022, 4:05 PM

Right now it expects

field_sets

, however it isn't a type-restriction but just a matter of the name of the attribute (it made sense to do it this way and involved less refactors to keep the old name). Once`lint.py` gets a partition object, it not longer cares about the type past the fact that it can make batches (so an iterable). The batches are then passed back to the "runner" and results are collected

hundreds-father-404

09/07/2022, 4:05 PM

Sounds reasonable to me. Maybe land this as a prefactor?

bitter-ability-32190

09/07/2022, 4:06 PM

I'm not sure that's possible 😞 It's all intertwined

bitter-ability-32190

09/07/2022, 4:07 PM

Oh but maybe it would work as an annoying prefactor if we make the Request type a generic and change the attribute name 🙂

👍 1

bitter-ability-32190

09/07/2022, 4:09 PM

So many incremental PRs 😵

bitter-ability-32190

09/07/2022, 7:58 PM

OK tried it as a prefactor and it just didn't look right. Changing it up like that was really out-of-context 😕

bitter-ability-32190

09/08/2022, 11:42 AM

Oh womp womp.

RequirementsPexRequest

dopes a transitive walk, as does

PythonSourceFilesRequest

bitter-ability-32190

09/08/2022, 12:54 PM

Knowing that, it makes me think because of memoization we might be better off sticking to TTs

witty-crayon-22786

09/09/2022, 5:01 PM

it should be fine either way, in the sense that in the worst case, computing the CTs adds one additional graph walk

witty-crayon-22786

09/09/2022, 5:01 PM

memoization is good, but when the inputs are arbitrary like this, you can’t always be sure that those lookups aren’t going to be doing any filtering, etc.

witty-crayon-22786

09/09/2022, 5:02 PM

11270 will eventually make arbitrary lookups cheaper, but not in the next few weeks.

bitter-ability-32190

09/09/2022, 6:13 PM

The implementation certainly is more transparent with TT, since we slice FieldSets

bitter-ability-32190

09/09/2022, 6:14 PM

We're no worse off (I suspect slightly better) than 2.14 even using TTs

bitter-ability-32190

09/11/2022, 6:47 PM

Turned out to be easier using CTs 🙂

👍 1

Open in Slack

Previous Next