Is it possible to come define targets finer grained than fil Pants #plugins

Is it possible to come define targets finer-graine...

flat-zoo-31952

03/12/2022, 9:48 PM

Is it possible to come define targets finer-grained than files? I'm trying to imagine how it would be remotely possible to use AST parsing in python to build a graph where the top-level definitions in modules are the nodes. I realize this is probably crazy and a bad idea for a lot of reasons, but I'm curious whether it's even feasible in the context of Pants' notion of how dependencies work. It seems like you'd need maybe define another kind of

python_?

target and make generators for that. And then you'd need to be able to have some sensible way of writing addresses for those targets. 🤔

🤯 3

happy-kitchen-89482

03/12/2022, 10:14 PM

Interesting. What is the use case you're thinking of? One that occurs to me is running tests in parallel even within a single *_test.py file.

bitter-ability-32190

03/12/2022, 10:52 PM

I've always thought comments or whitespace or formatting shouldn't invalidate the cache 🤔

bitter-ability-32190

03/12/2022, 10:53 PM

I wouldn't be surprised if it was possible. But likely would involve a lot of plumbing and wiring. But I could also be wrong and it be straightforward

flat-zoo-31952

03/12/2022, 11:04 PM

We have a ton of large files with a high churn rate that a lot of things depend on transitively. This is a problem but it's not so easy to sort out fast. I want to see if I can reduce the size of the average dependent closure by using a finer grained notion of dependency

flat-zoo-31952

03/12/2022, 11:05 PM

I can do the parsing pretty easily with ast tools, and then I figure I can map the line numbers up with the output from patches that git produces

bitter-ability-32190

03/12/2022, 11:27 PM

You might have more luck finding a good refactoring tool 😅

hundreds-father-404

03/12/2022, 11:49 PM

Pants's generic dependency code doesn't really care about sources: note some types of "atom" targets: • single file:

python_source

• a dir:

go_package

• no sources:

pex_binary

Pants's Python backend is heavily centered on the idea of the "atom" targets being based on exactly one file. So you'd need to roughly recreate the Python backend to instead be <insert the atom>-based It is theoretically possible to do this all. And you indeed could use target generators to spit out those "atom" targets for you. I think the biggest challenge is figuring out what that atom looks like, and how you apply it to things like Black, MyPy, and Pytest, which tend to be file-based

hundreds-father-404

03/12/2022, 11:50 PM

Pants's generic dependency code doesn't really care about sources

--changed-since

does though - that cannot get more atomic than a file. And I'm not sure how it could, given that we use

git diff

which gives a list of file names

bitter-ability-32190

03/13/2022, 12:06 AM

If we're going down the rabbit hole, if

changed

was pluggable, you can then generate an ast diff from the git diff of those files

👍 1

hundreds-father-404

03/13/2022, 12:41 AM

Yeah the core

--changed-since

rule could be changed to support it - the Rust engine is agnostic to how we implement this all, and the Target API should be flexible enough to handle this. Should we change it? Unclear / prob no. But could we? I think

happy-kitchen-89482

03/13/2022, 1:02 AM

I've always thought comments or whitespace or formatting shouldn't invalidate the cache

That is nuanced, because it would depend on what you're trying to do. For example, it absolutely should invalidate the cache if you're running a linter... But, for example, if you make whitespace changes to a .java file and it compiles to an identical .class file, then everything downstream from that will still be resolved from cache

happy-kitchen-89482

03/13/2022, 1:02 AM

So the cost of invalidating over whitespace need not be severe

bitter-ability-32190

03/13/2022, 1:11 AM

Ok yeah that's an extremely good point

bitter-ability-32190

03/13/2022, 1:16 AM

I mean you could of course change the cache key behavior based on some criteria. My tests take multitudes longer than my linters. But that is getting into the nuance like you say 😄

happy-kitchen-89482

03/13/2022, 2:26 AM

It's the kind of thing where we'd really need compelling evidence that it is a significant performance win in real-world cases, that makes the work worth it

👍 1

happy-kitchen-89482

03/13/2022, 2:26 AM

because it is no small amount of work...

➕ 1

hundreds-father-404

03/13/2022, 2:28 AM

I mean you could of course change the cache key behavior based on some criteria.

including continuing work, like bug fixes + more complex code slowing down everything else v1 had plugin authors manually reasoning about caching, and it was tough to get right! "There are only two hard things in Computer Science: cache invalidation and naming things."

happy-kitchen-89482

03/13/2022, 7:11 AM

There are only two hard things in Computer Science: cache invalidation, naming things, and off-by-one errors

➕ 3

😂 3

curved-television-6568

03/13/2022, 7:58 AM

Iiuc the use case is for getting a detailed dependency graph. If so, it doesn't have to map to the core engine constructs, right? A plugin could be written to produce said graph without too much trouble I think.

flat-zoo-31952

03/13/2022, 12:56 PM

I wouldn't even propose adding this as a core feature. I was just thinking through this because we are deep in the weeds with our dependency graph. It could be interesting to look into this at some point, but there's a ton of edge cases

flat-zoo-31952

03/13/2022, 12:57 PM

So it would have to be like... An alternate python backend plug-in.

flat-zoo-31952

03/13/2022, 1:12 PM

Anyways, based on these comments I don't think there's any kind of quick win for us here, especially if there's a bunch in the engine that assumes file-based granularity. The compelling reasons I could think of for doing this in the future might be to have better support for languages that have a more tenuous relationship between files and namespaces like ruby, php, or clojure. But I think it would be better to wait until use cases like that actually appeared to act on this. For python in general there's perhaps some practical benefit for, say, not considering typing imports as hard dependencies for testing purposes, but that gets into a bunch of problems quick, as there are popular libraries (e.g. FastAPI) that use typing constructs at runtime. And for linting and type checking it is a dependency. So it's really not clear and probably depends not only on the language but how people use it

👍 2

3 Views

Open in Slack

Previous Next