There will be a rule to infer Python dependencies ...
# development
h
There will be a rule to infer Python dependencies vs. one to infer Java sources etc, right? And it needs to be pluggable that someone can give their own implementation for how to infer for a Python target, right?
cc @witty-crayon-22786
w
yep
h
Okay, that could either makes this extremely complex or be fairly simple, depending on this: Do you think it’s acceptable to use the identity of the
Dependencies
field to signal which dep inference implementation to use? E.g.
PythonDependencies
and subclasses -> the Python rule that knows how to analyze Python code. We would enforce that there is only one dep inference implementation for that type to avoid ambiguity, like we do with codegen. As opposed to, for example, finding the corresponding
Sources
field for the target and seeing if its
PythonSources
.
For context, the basic flow right now I have in the
resolve_dependencies
rule:
Copy code
provided = field.raw_value

injected = result of running all injection rules for this Dependencies field type, like `ProtobufDependencies` having one

inferred = result of running the one single inferrer rule for this `Dependencies` field type, if any

return [*provided, *injected, *inferred]
w
Do you think it’s acceptable to use the identity of the 
Dependencies
 field to signal which dep inference implementation to use? E.g. 
PythonDependencies
 and subclasses -> the Python rule that knows how to analyze Python code.
yes, in general i do. i’d go further and say that each concrete class should require its own rule so that it’s an exact type match (but that’s the other conversation, heh.)
h
Wonderful. Thank you - I had a temporary moment of panic hehe
w
dep inference will need to recurse to request the
Sources
, to compute the output
… presumably without requesting codegen. but yea.
(…which would end up enforcing at some level that PythonDependencies and PythonSources were matched, via an interesting recursive dance, heh)
h
Agreed. The plan for now is to pass the original
Dependencies
field to the inference rule. This field stores the
address
, so, in the rule, we can call
await Get[Target](Address, field.address)
presumably without requesting codegen
Agreed. This current design wouldn’t do that because it would encounter a
protobuf_library
and see “I don’t have a inference rule for ProtobubLibrary; skip.”
w
well, i mean the dep inference rule
anywho
yep
h
i mean the dep inference rule
As do I, but I think we’re on the same page here.
Okay, this means I can indeed put up the injection PR right now then 🙂
w
Agreed. This current design wouldn’t do that because it would encounter a 
protobuf_library
 and see “I don’t have a inference rule for ProtobubLibrary; skip.”
what i meant here is that while inferring deps, a rule for PythonDependencies would request PythonSources for the Sources field, which might trigger codegen otherwise
h
what i meant here is that while inferring deps, a rule for PythonDependencies would request PythonSources for the Sources field, which might trigger codegen otherwise
This would only happen if in that rule we said
HydrateSourcesRequest(codegen_enabled=True)
. That is off by default.
w
it should request PythonSources from a correctness perspective, although it might be able to get away with just requesting whatever kind of sources are there based on some assumption of validation. but then it might actually end up inferring python deps for some arbitrary combination of fields where someone declared something like
PythonDependencies + RustSources
h
It can request
for_sources_types=[PythonSources], codegen_enabled=False
w
yep.
h
some arbitrary combination of fields where someone declared something like PythonDependencies + RustSources
Yay Field-Driven API? 🙃 hehe
w
heh, yep.
that’s what i meant by “strange recursive validation dance”
👍 1
it does make me wonder whether this should really be kicked off by dependencies subclasses rather than sources** subclasses, but… can ignore that for now =P
h
it does make me wonder whether this should really be kicked off by dependencies subclasses rather than sources** subclasses
I don’t follow
w
it’s not strictly the Dependencies field that decides whether some dependencies can be inferred for some sources
it’s the sources field… we’re just using the dependencies field to kick off the recursion
h
Agreed, it really is about what the
Sources
field’s identity is. We could make this work without making call sites more complex, I think (one of my main goals). In
resolve_dependencies
, call
await Get[WrappedTarget](Address, deps_field.address)
, then inspect the identity of the
Sources
field, if any.
w
can iterate once it’s in i think.
It’s Alpha TM
h
True, but I don’t think the
await Get[WrappedTarget]
so that we base this in the
Sources
identity is much more complex. (I thought it would be way more involved when I started this thread)
w
k
mentioned yesterday, i do idly wonder whether some of these things would be easier if they operated on batches of fields
anywho anywho
h
i do idly wonder whether some of these things would be easier if they operated on batches of fields
the
AsyncField
PR confirmed my intuition that I don’t think so. Most those call sites resolved dependencies in very different stages than hydrating sources. Some, like
setup-py
, never even need sources
Okay put up part 1, dependency injection: https://github.com/pantsbuild/pants/pull/9851 Took me a while to figure out how to explain why I don’t think precise dependencies are worth the complexity. Not sure I argue it very well, but I feel fairly strongly about and can try to reword if it doesn’t make sense
w
nioce. thanks! will look this afternoon
❤️ 1
h
So, my sole concern about doing the
await Get[WrappedTarget]
in
resolve_dependencies
is an unnecessary perf hit if you don’t even have dep inference. While small, there is a cost to go from
Address -> Target
, such as calling the constructor for every
Field
registered on the target. Two ways to solve this, I think: 1. Add a global flag
--dependency-inference-enabled
2. Simply check the
UnionMembership
to see if there are any implementations registered in the first place. I think #2 is better, right? Simpler and easier to onboard folks.
w
mm… i think i’d want to see it showing up in traces before adding flags to enable or disable
although we might independently have flags to control inference.
i also wouldn’t bucket “inference” with the injection that happens for codegen… the latter is a thing that folks mostly already expect to happen.
(…from a settings perspective… obviously related from an implementation perspetive)
👍 1
h
I’ll go with the second approach, for now. Beyond perf, it’s clearer code. Thanks!
w
i’m not sure… it’s redundant, right? because you’re checking in two places