Just for fun I just finished PoC <https github com pantsbui Pants #development

(Just for fun) I just finished PoC <InferDependenc...

bitter-ability-32190

07/21/2022, 5:00 PM

(Just for fun) I just finished PoC InferDependenciesRequest should support/allow batching On the Pants repo, with just adding batched inference of Python imports: • An uncached, no-pantsd

peek ::

went from real:user

0m17.976s : 1m20.140s

0m15.957s : 0m44.013s

• Capping it at 4 cores (I have 64 😉) from

0m25.812s : 1m10.910s

0m16.730s : 0m43.133s

I'll try it in my repo here in a few days to give more metrics + "touched file" metrics, but looks promising (oh and I didn't BLOW through the UI, lol)

❤️ 2

🐇 2

bitter-ability-32190

07/21/2022, 5:15 PM

reminder that dependency expansion doesn't happen in "batches" on the specs' targets like

fmt

lint

, but rather in "batches" as we traverse the dependency tree (e.g. all nodes at each level as we traverse becomes a batch)

bitter-ability-32190

07/21/2022, 5:23 PM

Oh also, this might need changes or get blasted by https://github.com/pantsbuild/pants/issues/11270 So, still very much just PoC

happy-kitchen-89482

07/21/2022, 6:54 PM

I thought dep inference had to look at the whole repo? how does this interact with traversal?

bitter-ability-32190

07/21/2022, 6:56 PM

If I want the transitive deps of

a.py

, I don't need to look at the repo. Just run inference on

a.py

, which gives me

, and

. Then I need to go again running inference on

, and

, which gives me .... Or in code: https://github.com/pantsbuild/pants/blob/29750e6ea4cc23ea59ff4a027d587f19f2a63304/src/python/pants/engine/internals/graph.py#L525

hundreds-father-404

07/21/2022, 9:09 PM

we have to create a global mapping, but don't have to parse every file. Only

dependees

requires parsing every file

➕ 1

bitter-ability-32190

07/21/2022, 9:10 PM

And

peek ::

, 😉

👀 1

bitter-ability-32190

07/21/2022, 9:11 PM

(edited 😛)

👍 1

happy-kitchen-89482

07/21/2022, 10:36 PM

But how do we know which files to look at, given x, y, z?

happy-kitchen-89482

07/21/2022, 10:36 PM

Those are modules

happy-kitchen-89482

07/21/2022, 10:36 PM

Not file paths

bitter-ability-32190

07/21/2022, 10:48 PM

Like Eric said, we globally map files to module names. But that's like a Python loop. Not expensive or async

bitter-ability-32190

07/22/2022, 12:23 AM

@witty-crayon-22786 how much will this work be blasted by https://github.com/pantsbuild/pants/issues/11270? Debating taking this from draft to real

witty-crayon-22786

07/22/2022, 4:19 PM

11270 will still have the same batch-expansion loop: it will just be memoized to avoid re-executing batches when other processes are expanding them. but … concurrent executions will be a bit weird. it means that if you do end up having multiple overlapping TransitiveTargets requests concurrently, they won’t share the dependency extraction work.

witty-crayon-22786

07/22/2022, 4:20 PM

but… i expect that my “but” above will apply to your existing patch too? i.e., if one of the slow cases from 11270 (request 100 TTs/CTs in parallel) is executed with your patch, i think that you’ll end up with lots of distinct/unique batches. they’ll hit the process cache, but not the rule cache. so batching would probably make 11270 worse

bitter-ability-32190

07/22/2022, 4:23 PM

My guesstimation is the overhead of inferring on N field sets isn't linear w.r.t N, which makes it an overall win. E.g. inferring deps for 2 FSs isn't 2x one FS, because the majority of the time spent is process setup (by Pants and by the process itself like initializing Python).

witty-crayon-22786

07/22/2022, 4:24 PM

sure. but to be clear: the issue with regard to 11270 is that you would end up extracting the deps for a single file 100 times for 100 TTs/CTs in the worst case

witty-crayon-22786

07/22/2022, 4:25 PM

since each time you enter that method, you’re likely to have different batches created

witty-crayon-22786

07/22/2022, 4:26 PM

to solve that (and possibly also to find a cleaner way to do

lint

fmt

batching), one thing that you could do would be to have deterministic batches…i.e., a file is always batched with everything in its directory, for example.

witty-crayon-22786

07/22/2022, 4:28 PM

11270 has come up in a few contexts recently, so i’ll likely need to start it next week. but i don’t think that it is a blocker for batching, per-se: you’ll just need to find a way to avoid the issue that is caused by the situation described on 11270

bitter-ability-32190

07/22/2022, 4:29 PM

Can you give a trivial example to help me understand?

witty-crayon-22786

07/22/2022, 4:52 PM

if you run

test

on 10 targets, 10 TransitiveTarget requests will be created for the 10 roots

witty-crayon-22786

07/22/2022, 4:53 PM

each of them will start with a batch for the dependencies of their roots: they will be very unlikely to be identical

witty-crayon-22786

07/22/2022, 4:53 PM

their next batches will be very unlikely to be identical, and so on

bitter-ability-32190

07/22/2022, 4:55 PM

Ah ok my understanding was correct. Really without measuring we're shooting in the dark for how "bad" the overhead is

happy-kitchen-89482

07/22/2022, 5:01 PM

Oh, I had assumed we would be caching the results for each file independently?

bitter-ability-32190

07/22/2022, 5:02 PM

That would only work with the synthetic process stuff I've put aside. And even still, would onyl cache at the process level not the rule level (as the rules would still be on FieldSet batches)

Open in Slack

Previous Next