Is it possible to come define targets finer-graine...
# plugins
f
Is it possible to come define targets finer-grained than files? I'm trying to imagine how it would be remotely possible to use AST parsing in python to build a graph where the top-level definitions in modules are the nodes. I realize this is probably crazy and a bad idea for a lot of reasons, but I'm curious whether it's even feasible in the context of Pants' notion of how dependencies work. It seems like you'd need maybe define another kind of
python_?
target and make generators for that. And then you'd need to be able to have some sensible way of writing addresses for those targets. šŸ¤”
🤯 3
h
Interesting. What is the use case you're thinking of? One that occurs to me is running tests in parallel even within a single *_test.py file.
b
I've always thought comments or whitespace or formatting shouldn't invalidate the cache šŸ¤”
I wouldn't be surprised if it was possible. But likely would involve a lot of plumbing and wiring. But I could also be wrong and it be straightforward
f
We have a ton of large files with a high churn rate that a lot of things depend on transitively. This is a problem but it's not so easy to sort out fast. I want to see if I can reduce the size of the average dependent closure by using a finer grained notion of dependency
I can do the parsing pretty easily with ast tools, and then I figure I can map the line numbers up with the output from patches that git produces
b
You might have more luck finding a good refactoring tool šŸ˜…
h
Pants's generic dependency code doesn't really care about sources: note some types of "atom" targets: • single file:
python_source
• a dir:
go_package
• no sources:
pex_binary
Pants's Python backend is heavily centered on the idea of the "atom" targets being based on exactly one file. So you'd need to roughly recreate the Python backend to instead be <insert the atom>-based It is theoretically possible to do this all. And you indeed could use target generators to spit out those "atom" targets for you. I think the biggest challenge is figuring out what that atom looks like, and how you apply it to things like Black, MyPy, and Pytest, which tend to be file-based
Pants's generic dependency code doesn't really care about sources
--changed-since
does though - that cannot get more atomic than a file. And I'm not sure how it could, given that we use
git diff
which gives a list of file names
b
If we're going down the rabbit hole, if
changed
was pluggable, you can then generate an ast diff from the git diff of those files
šŸ‘ 1
h
Yeah the core
--changed-since
rule could be changed to support it - the Rust engine is agnostic to how we implement this all, and the Target API should be flexible enough to handle this. Should we change it? Unclear / prob no. But could we? I think
h
I've always thought comments or whitespace or formatting shouldn't invalidate the cache
That is nuanced, because it would depend on what you're trying to do. For example, it absolutely should invalidate the cache if you're running a linter... But, for example, if you make whitespace changes to a .java file and it compiles to an identical .class file, then everything downstream from that will still be resolved from cache
So the cost of invalidating over whitespace need not be severe
b
Ok yeah that's an extremely good point
I mean you could of course change the cache key behavior based on some criteria. My tests take multitudes longer than my linters. But that is getting into the nuance like you say šŸ˜„
h
It's the kind of thing where we'd really need compelling evidence that it is a significant performance win in real-world cases, that makes the work worth it
šŸ‘ 1
because it is no small amount of work...
āž• 1
h
I mean you could of course change the cache key behavior based on some criteria.
including continuing work, like bug fixes + more complex code slowing down everything else v1 had plugin authors manually reasoning about caching, and it was tough to get right! "There are only two hard things in Computer Science: cache invalidation and naming things."
h
There are only two hard things in Computer Science: cache invalidation, naming things, and off-by-one errors
āž• 3
šŸ˜‚ 3
c
Iiuc the use case is for getting a detailed dependency graph. If so, it doesn't have to map to the core engine constructs, right? A plugin could be written to produce said graph without too much trouble I think.
f
I wouldn't even propose adding this as a core feature. I was just thinking through this because we are deep in the weeds with our dependency graph. It could be interesting to look into this at some point, but there's a ton of edge cases
So it would have to be like... An alternate python backend plug-in.
Anyways, based on these comments I don't think there's any kind of quick win for us here, especially if there's a bunch in the engine that assumes file-based granularity. The compelling reasons I could think of for doing this in the future might be to have better support for languages that have a more tenuous relationship between files and namespaces like ruby, php, or clojure. But I think it would be better to wait until use cases like that actually appeared to act on this. For python in general there's perhaps some practical benefit for, say, not considering typing imports as hard dependencies for testing purposes, but that gets into a bunch of problems quick, as there are popular libraries (e.g. FastAPI) that use typing constructs at runtime. And for linting and type checking it is a dependency. So it's really not clear and probably depends not only on the language but how people use it
šŸ‘ 2