re: <https://github.com/pantsbuild/pants/pull/9316...
# development
i don't think: 1.
PythonBinaryFields
needs to be a RootRule, 2. that it would need to be a concrete type even if it was going to come in from outside the engine
anything you can do to hydrate a set of fields (even if some of them are not AsyncField) you can due generically in a rule that consumes a
SelectFields
type, afaict.
because the implementation of SelectFields would look like this:
Copy code
@rule
def hydrate_multiple_fields(select_fields: SelectFields) -> SelectedFields:
  hydrated_async_fields = await MultiGet(
    Get[HydratedField](AsyncField, field)
    for field in select_fields
    if isinstance(field, AsyncField)
  )

  fields
  for field in select_fields:
    if isinstance(field, AsyncField):
      hydrated = hydrated_async_fields.pop_first()
    elif isisntance(field, SyncField):
      hydrated = field.hydrate()
    else:
      raise Exception("hmmm?")
    fields.append(hydrated)
  return SelectedFields(fields)
@hundreds-father-404: ^
h
anything you can do to hydrate a set of fields (even if some of them are not AsyncField) you can due generically in a rule that consumes a SelectFields type, afaict.
I’m sure we could find some way to get a
SelectFields
type working, if we tried hard enough, but I don’t understand the motivation to have this
Copy code
def rule(fields: SelectFields(PythonBinarySources, Compatibility)) -> CreatedBinary
instead of this:
Copy code
class PythonBinaryFields(FieldsCombination):
  required_fields = (EntryPoint, PythonBinarySources)

def rule(fields: PythonBinaryFields) -> CreatedBinary
Is it for less boilerplate to not need to create a new class? -- Also, with
SelectFields
, how do you express
optional_fields
and
opt_out_field
? (See https://github.com/pantsbuild/pants/pull/9316#issuecomment-600154783)
w
yes. afaict it is boilerplate
the reason to have a distinct type would be to provide the ability to have unique rules run to hydrate that combination of fields... and i don't think that's what we're going for
optional and opt out would be implemented by checking whether fields were optional there, i think?
h
yes. afaict it is boilerplate
Okay, so the reason for
SelectFields
is less boilerplate and it sounds like a conceptual argument that we should not have a distinct combination type? What again is the reason that conceptually it is wrong to have a new type registered? (Not boilerplate, I mean conceptually) Are there other motivations for
SelectFields
?
w
What again is the reason that conceptually it is wrong to have a new type registered?
if you used a concrete subclass of SelectFields instead, and you had two of them with completely identical contents (fields X, Y, Z for address A), you would not want to hydrate them twice. they are identical
based purely on 1) the fields, 2) the address
the only reason they might not be identical would be if you had custom logic to hydrate that combination of fields (and i've asked whether that is a goal... it doesn't sound like it is)
afaict,
SelectFields
is shorthand for the caller to avoid needing to hydrate the fields one by one themselves
cc @average-vr-56795 ^ since you expressed interest in this discussion
🙏 1
👍 1
h
afaict, SelectFields is shorthand for the caller to avoid needing to hydrate the fields one by one themselves
Taking a step back, I don’t think hydration of `AsyncField`s is the right thing to optimize for. 95% of fields are
PrimitiveFields
and simply accessed by calling
compatibility.value
. The only fields that need hydration are
Sources
,
Dependencies
, and
Bundles
. (Keep in mind, we’ve been writing targets for 5 years so while we will likely add a couple more `AsyncField`s the next few years, we do have a good intuition that the vast majority of fields are
PrimitiveFields
.) Further, per https://github.com/pantsbuild/pants/pull/9306#issuecomment-599694654, we don’t even expect any rules to directly need to hydrate the
Sources
field. They instead will call
await Get[SourceFiles](AllSourceFilesRequest)
or
await Get[SourceFiles](SpecifiedSourceFilesRequest)
Which speaks to another point, I don’t think we want the rule’s signature to give us the already hydrated values. We only want the collection of
Field
instances, for a couple reasons: 1. It’s essential that we are dealing with the typed
Field
for MyPy to work. There would be no way with `SelectField`s for MyPy to know that
SelectFields(Compatbility, PythonSources)
gives us back
Tuple[Tuple[str, ...], SourcesResult]
. We could have MyPy infer that it’s
Tuple[Compatibility, PythonSources]
if we didn’t do any hydration within
SelectFields
, but then we end up needing to do all hydration in the rule body, as before. 2. We don’t know how that typed
Field
is going to be consumed. For example, we need to pass the
Field
instance to
await Get[SourceFiles](SpecifiedSourceFilesRequest((sources_field, origin))
w
Re: 1): my feeling is that if we're requiring people to declare concrete subclasses of something to appease MyPy in cases where we can otherwise guarantee that something is safe, we would be pushing work to the user that would be better spent appeasing MyPy with generics in more places in our API
Re: 2) agreed. But that might argue in the direction of waiting to introduce these patterns until we start to see repetition. It feels pretty unclear at this point how and how frequently "compound field hydration" will be used, so maybe it isn't a good time for that abstraction?
h
It feels pretty unclear at this point how and how frequently “compound field hydration” will be used, so maybe it isn’t a good time for that abstraction?
Gotta run to a meeting, but keep in mind that we aren’t formalizing any abstraction. The PR very intentionally doesn’t yet add a superclass because I agree that we’re not yet ready to commit to a particular implementation. In contrast, afaict,
SelectFields
would be a new abstraction we’re commiting to. Maybe we do at the end of the day use
SelectFields
. This PR doesn’t at all block that - we would then have a diff to see with the actual code “which do we prefer?”
w
i'm currently leaning toward waiting until we've repeated ourselves a few times, and then introducing a feature like this that clearly removes steps/boilerplate/complexity
rather than doing it preemptively.
but yea, can talk later.
h
i’m currently leaning toward waiting until we’ve repeated ourselves a few times, and then introducing a feature like this that clearly removes steps/boilerplate/complexity
Agreed. This is the approach I’m trying to take, hence this PR that does the simplest thing possible (using a plain and simple dataclass). We can clean up easily later, once we’re ready to optimize
w
the simplest thing possible is to directly request the fields you need, one by one
right...?
h
the simplest thing possible is to directly request the fields you need, one by one
Negative. There are currently no rules to say “Get me a Compatibility field”. That would involve defining an entirely new mechanism and a bunch of rules to say
Target -> Compatibility
. The simplest possible implementation is that you have a plain and simple
Target
and you use pure Python to say
tgt.get(Compatibility)
. As a general principle, avoiding the engine is generally simpler than using it
w
sure. but, there must be independent rules to hydrate each AsyncField type. and a caller can invoke them
adding an abstraction that allows more than one to be invoked at once seems to be what this PR is about
h
Yes, but there are only 3 `AsyncField`s in the entire target universe, and we us
determine_source_files.py
as an abstraction over hydrating
Sources
already. So that leaves us
Bundles
, which we’re ignoring, and `Dependencies, which is unclear what hydration should look like and sometimes won't even be necessary because its
sanitizied_raw_value
is already a
Tuple[Address, …]
, which is useful
adding an abstraction that allows more than one to be invoked at once seems to be what this PR is about
See https://pantsbuild.slack.com/archives/C0D7TNJHL/p1584467587332900?thread_ts=1584466038.329400&amp;cid=C0D7TNJHL again. This PR is not in any way about adding sugar to hydrating `AsyncField`s. It is solely about “I have a bunch of Targets and I want to filter out the ones that can be used by this downstream rule, based on which
Fields
it has defined.”
w
feels like some whiteboard time might be good for this.
@hundreds-father-404: does 8am PT tomorrow work for you?
h
Sure, thanks! My best friend is coming home early from Fulbright in Mexico today, and there’s a possibility she’ll be staying at my apartment tonight and a possibility that she won’t land until really late (like 2 am) if she gets delayed. So, I might need to take a rain check, but otherwise 8 am sounds good. I should know this evening if she’s on track
w
ok
h
I don't have the full technical picture in my head, but I would like to gently encourage not making the great the enemy of the good. It seems like we could land this as-is so that we can make other progress on top of it today, and still whiteboard this tomorrow and hash out possible boilerplate reduction. In fact it might be easier to do so with some concrete boilerplate to look at? I just don't want this to get too bogged down on what feels like something we can improve later, if possible.
w
the issue is that it still seems to me like this obfuscates what we're trying to do here, which at a fundamental level is "loop over the fields and ensure that they exist"
h
It seems like we could land this as-is so that we can make other progress on top of it today
That would be great. I’d love to land this PR so that I can put up the followup to make
HydratedStruct -> Target
a rule and to add the
targets()
end-point to
register.py
w
doing that and then removing that boilerplate seems clearer
h
doing that and then removing that boilerplate seems clearer
I’m not sure what the first
that
refers to, but the plan is to put up that followup PR right away We aren’t committing to anything with these rapid-fire Target API PRs. I’m trying to break it down into digestible chunks, which means that much of it is not the final API and is very subject to change
w
the
that
is: "a loop over the fields i want" vs "concrete class that describes my loop"
h
Whether we use it as a method on a dataclass or through your proposal, I think we are fundamentally describing the same thing. Given a
Target
, loop over all of its registered `Field`s to see if the
Target
has those 2 fields and then grab those 2 fields to be consumed by the rule. Right? The precise mechanism can change still. We only need some mechanism to be able to build off of it.
w
Given a
Target
, loop over all of its registered `Field`s to see if the
Target
has those 2 fields and then grab those 2 fields to be consumed by the rule. Right?
yes. my feeling is that we should do that directly in consumers for now
and if it ends up looking repetitive, add more abstraction
h
yes. my feeling is that we should do that directly in consumers for now
Sure, we could change the PR to directly use
tgt.get(EntryPoint)
, rather than using this
PythonBinaryFields
intermediate type. But, I think we’d agree that we want the input to this rule to have fine-grained caching, that is to only depend on the typed `Field`s that are specifically needed for that rule to run. Even though this PR doesn’t change the rule’s input, the next followup does to have precisely that fine grained caching by having the rule request
PythonBinaryFields
. Agreed that we might end up dropping
PythonBinaryFields
, but in the meantime, by having it, we make progress on two goals: 1. Given a target, check if it has X fields then consume those fields. 2. Have the input to the rule be only the combination of fields needed to run for better caching. If we remove
PythonBinaryFields
and directly call
tgt.get(EntryPoint)
in the rule body, then we would at least be solving goal #1 and it would be some progress to land. With the current PR, we would be solving both #1 and #2, which is even more progress (even though we might change the mechanism for how we do that)
w
we would also be accomplishing 2
unless you called
get
for all fields
h
Maybe this will help. (I realize you don’t have the followup PR to see for context.) In the followup, I was going to add this rule:
Copy code
@rule
async def convert_python_binary_target(adaptor: PythonBinaryAdaptor) -> PythonBinaryFields:
    wrapped_tgt = await Get[WrappedTarget](Address, adaptor.address)
    return PythonBinaryFields.create(wrapped_tgt.target)
Then, have
create_python_binary
take the param
PythonBinaryFields
rather than
PythonBinaryTargetAdaptor
. That change is what I mean by goal #2 of fine grained caching. Currently, with
create_python_binary
using
PythonBinaryTargetAdaptor
as its param, that means that any changes to the target adaptor will invalidate the rule’s cache, even if it’s irrelevant. We could alternatively achieve this with your proposal of changing the rule signature to
Copy code
def create_python_binary(fields: SelectFields(EntryPoint, PythonBinarySources)
as you suggest. Maybe we end up tomorrow refactoring this to use
SelectFields
. Even if we throw away
PythonBinarySources
tomorrow, this gets us closer to achieving goal #2 and helps to guide the discussion
w
which is fine, but why can't ``create_python_binary`` just inline those two lines?
h
(This also achieves goal #3 of extensibility. Regardless of whether we use
PythonBinaryFields
vs.
SelectFields
, we need to stop requesting
PythonBinaryAdaptor
in the rule signature)
w
i'm not even sure that SelectFields is necessary either... the abstraction in general might be premature if the alternative is to inline those lines
h
which is fine, but why can’t
create_python_binary
just inline those two lines?
Because the rule signature is what matters. The motivation is to get
create_python_binary
to stop requesting
PythonBinaryAdaptor
in its signature to instead request in its signature only the Typed Fields it needs. The rule needs to be able to work with targets like
MyCustomPythonBinary
the abstraction in general might be premature if the alternative is to inline those lines
I disagree because of the above points about needing to be extensible / work with custom target types, and to have fine grained caching. We must have some mechanism for the rule’s signature to say “I only need these 2 fields”. That cannot be inlined in the downstream rule. We’re describing two different approaches to that mechanism, but either way, we need a mechanism to do this.
w
We must have some mechanism for the rule’s signature to say “I only need these 2 fields”.
this might be a root of the misunderstanding: ... it does not need to be part of the signature of the rule if it is declared in the Gets for the rule (ie by inlining that code).
a goal could request unhydrated targets, and then only hydrate the ones it cares about, field at a time.
that provides fine grained caching, extensibility, etc.
h
this might be a root of the misunderstanding: ... it does not need to be part of the signature of the rule if it is declared in the Gets for the rule (ie by inlining that code).
Sure, fwit, the plan is for `@goal_rule`s to request
Targets
. This is about what makes sense for downstream rules. I think it’s much simpler for
create_python_binary
to say “I need these 2 fields, get it for me through whatever engine magic” than for it to say “get me a generic Target and I’ll try to do something meaningful with it. If you happened to give me a `PythonTests`…throw an exception? no-op?”
a goal could request unhydrated targets, and then only hydrate the ones it cares about, field at a time.
Agreed, this is precisely the plan
w
Agreed, this is precisely the plan
then let's land that, exactly, and follow up with improved abstraction, i think
I think it’s much simpler for
create_python_binary
to say “I need these 2 fields, get it for me through whatever engine
magic” than for it to say “get me a generic Target and I’ll try to do
something meaningful with it. If you happened to give me a `PythonTests`…throw an exception? no-op?”
i'm not suggesting raising exceptions. i'm suggesting having create_python_binary filter to the targets that have the field.
if it can't do that, then neither can the abstraction... so i assume that it is possible to inline that in the consumer
h
And what does it return if it doesn’t have that field? Right now, the rule always returns CreatedBinary The filtering happens one level of abstraction higher, in binary.py. With this design (using either the dataclass or SelectFields), that setup would remain: filter out relevant targets in binary.py then pass them down to create_python_binary. create_python_library has no knowledge of how it got those targets because that’s not it’s concern. It’s only concern is transforming the 2 fields into CreatedBinary
(Oops didn’t meant to send to channel. On my phone on a walk)
w
ok. i think i see. PythonBinaryFields is being used to select a unique rule
sorry, i tried to clarify that before.
h
Yes, precisely
h
In the most current iteration, it’s not. It’s still inlined. In the immediate followup, we fix that to select a unique rule based off of the unique combination of Fields
w
it feels like they should have been one PR then. this has been really confusing.
it's also called out as part of the PR description without actually being demonstrated.
sorry.
will re-review in the next 30
❤️ 1
h
Apologies. At the time of writing it yesterday, I thought I’d first need a rule to go from HydratedStruct->Target. I realized last night that that was not the case. I could have fixed this PR to do the unique combination thing, but thought “eh hopefully this lands by the time I wake up, then I’ll immediately put up the followup I have queued” Where I was miscalculated was that I should have fixed the original, rather than forward fixing. I’ll put up that fix when I get back from my walk :)
Okay, cleaned it up to do what it should have done from the start and put some thoughts on how to improve the scope of this PR were I to do it again. https://github.com/pantsbuild/pants/pull/9316#pullrequestreview-376427234 Thanks for your help reviewing this one and I apologize for all the confusion!
Woot, got it working for `@goal_rule`s to request
Targets
. Finishes changing
binary
and
run
to exclusively use the Target API! As expected, it causes some good changes from #9316. One of them is that I was wrong about
PythonBinaryFields
being a
RootRule
. Instead, it’s a union member Also @witty-crayon-22786 tomorrow at 8 am works. My friend is going straight home because of a snow storm at her home in Flagstaff tonight that they want to avoid.
👍 1