Regarding the current work adding `description of origin` da Pants #development

Regarding the current work adding `description_of_...

curved-television-6568

06/07/2022, 12:36 PM

Regarding the current work adding

description_of_origin

data to various request types, I’m wondering if there’s not a better more generic approach to this? 🧵

curved-television-6568

06/07/2022, 12:37 PM

Rather than augmenting each and every request type, I feel that this kind of information is desirable for all rules in face of errors. What is wrong and how did we end up here.

curved-television-6568

06/07/2022, 12:39 PM

In order to make that work generically, I imagine there could be some feature where this information can be provided during rule execution that sets some context that may be inspected/reported in case of errors.

curved-television-6568

06/07/2022, 12:40 PM

Either before a

Get

call, or as an additional argument..

curved-television-6568

06/07/2022, 12:40 PM

(I’ll ponder some more on this…)

bitter-ability-32190

06/07/2022, 1:38 PM

I actually am jiving with this train of thought 🤔

bitter-ability-32190

06/07/2022, 1:40 PM

I like that

description_of_origin

gives the user rich context, but for the developer I always find it hard to reason about because it's really provided in the context of some other message. There has to be a way (and I'm such it's very delicate) where there isn't such tight coupling between the caller and callee

curved-television-6568

06/07/2022, 1:51 PM

Indeed, I think that the

description_of_origin

shows what it is we want to achieve, only I see it as very disruptive and ad-hoc which makes it inhibitly (not a word according to my spell check? 😛 ) difficult pattern to apply generically for more (ideally all) scenarios.

➕ 1

curved-television-6568

06/07/2022, 2:12 PM

cc @hundreds-father-404 (I’m sure you’ll find this thread any way, but hey… 😉 )

hundreds-father-404

06/07/2022, 2:22 PM

So the information must come from somewhere from a plugin author. We can't automatically infer the context something comes from, like "CLI arguments" vs "the option --path-to" I don't think there's any way to get around the human declaring the context? From there, it's only a question of the best way to factor how the human sets that information

➕ 1

curved-television-6568

06/07/2022, 2:23 PM

Yep, agreed.

curved-television-6568

06/07/2022, 2:24 PM

What I’m looking for I guess is if this is possible to get (err, provide is perhaps more accurate), out of band, in some way.

curved-television-6568

06/07/2022, 2:27 PM

Take this for example:

[Eric] I think that we have to get rid of
Addresses -> UnexpandedTargets -> Targets
, and instead add
TargetsRequest
.

from https://github.com/pantsbuild/pants/issues/14468#issuecomment-1147977415 Say that at the call site for

await Get(Targets, Addresses(…))

we could do something like this instead:

await Get(Targets, Addresses(…), description_of_origin="…")

that would propagate through all the rules so if you end up in an error handler, you can include that origin description in the error message.

hundreds-father-404

06/07/2022, 2:30 PM

I think the challenge there is how in any given

@rule

, you might have multiple "origins" in play at the same time. For example,

./pants paths

has both

--paths-to

and

--paths-from

..then while hydrating the dependency graph, we have tons of

dependencies

fields in play So we need to keep the

description_of_origin

tightly coupled to each specific caller or

Get

Given that, what would be the benefit of

Get(Targets, Addresses(…), description_of_origin="…")

Get(Targets, TargetsRequest([addresses], description_of_origin="..."))

. Afaict that's only a different typing

curved-television-6568

06/07/2022, 2:31 PM

Perhaps this could be a form of “rule execution context” where any kwargs are preserved, and passed to rules when requested as inputs.

Copy code

@rule
async def my_rule(request: SomeRequestType, rule_context: RuleExecutionContext) -> ...:
  if bad_stuff:
    raise InvalidStuff(f"This was no good, because {rule_context.get('reasons', 'unknown')}.")
...

curved-television-6568

06/07/2022, 2:34 PM

[…] you might have multiple “origins” in play at the same time.

yeah, that could prove my idea to be a bad one. (I’ll see if I get the chance to poc something up with this, time permitting, to see if that’s the case)

👍 1

curved-television-6568

06/07/2022, 2:35 PM

Afaict that’s only a different typing

as I see it, it’s more than different typing, as you don’t need to change the structure of your request types, nor introduce new types.

witty-crayon-22786

06/07/2022, 4:12 PM

fwiw: in the past, we’ve discussed supporting the catching of exceptions to allow for adding more context and then rethrowing.

witty-crayon-22786

06/07/2022, 4:13 PM

Perhaps this could be a form of “rule execution context” where any kwargs are preserved, and passed to rules when requested as inputs.

this is effectively https://github.com/pantsbuild/pants/issues/7490 i think

👍 1

witty-crayon-22786

06/07/2022, 4:14 PM

a third idea is essentially a “Try” syntax of some sort…

await Try

, which could fail and return a value

🤔 2

witty-crayon-22786

06/07/2022, 4:57 PM

ok, fourth idea, which i think is my preference. lean in on engine stacktraces, even when

--no-print-stacktrace

. in particular, https://github.com/pantsbuild/pants/issues/10536 talks about adding EngineAware params verbatim to the end of the description of a

@rule

. if we did that, and then even when

--no-print-stracktrace

, we rendered the stack of rules with descriptions, i think that it would catch 95% of these. the primary risk is that anytime we use this data for some “new” (sortof) use case, we complicate what it means to add a description, set the level, etc. but i think that that just means that it is a very powerful mechanism

witty-crayon-22786

06/07/2022, 4:59 PM

also: @curved-television-6568: thanks for raising this. i was willing to avoid talking about it in the interest of getting https://github.com/pantsbuild/pants/issues/14468 done, but it really does seem like a lot of contortion is necessary

❤️ 1

witty-crayon-22786

06/07/2022, 5:52 PM

yea, i’m pretty certain that we should do that. sketched it out here: https://github.com/pantsbuild/pants/pull/15769

💯 1

hundreds-father-404

06/08/2022, 7:19 PM

Thanks @witty-crayon-22786, I think #15769 is a good change regardless. I still strongly advocate for

description_of_origin

though because we need the precision of individual Gets: •

--paths-from

--paths-to

, whereas otherwise the most precise we have is "the

--paths

goal" •

archive(files=)

archive(package=)

, whereas otherwise we just have "resolving dependencies" •

python_distribution(entry_points=)

python_distribution(provides=setup_py(...))

•

python_test(runtime_package_dependencies)

hundreds-father-404

06/08/2022, 7:21 PM

I'm almost done with PRs to update all of Pants to do the right thing. I don't expect it to be too onerous for new callers to use

Get(WrappedTarget, WrappedTargetRequest(address, description_of_origin="the option --paths-to"))

Get(WrappedTarget, Address, address)

? If anything, that's somewhat more predictable. Follows our X and XRequest convention

hundreds-father-404

06/08/2022, 7:24 PM

My most important concern is that, here, precision matters. A less strong concern is how hard it is in practice to get people to correctly set

EngineAware

metadata -- folks may not realize it should be set in a particular way for error messages to render exactly the right way. Whereas

description_of_origin

intentionally must be set

bitter-ability-32190

06/08/2022, 7:29 PM

Not saying I have a solution, but perhaps it would be possible to make

description_of_origin

a bit more clientside-intuitive? I think that's the pain point here.

hundreds-father-404

06/08/2022, 7:30 PM

What do you mean?

bitter-ability-32190

06/08/2022, 7:31 PM

I like that
description_of_origin
gives the user rich context, but for the developer I always find it hard to reason about because it's really provided in the context of some other message.

There has to be a way (and I'm sure it's very delicate) where there isn't such tight coupling between the caller and callee

👍 1

hundreds-father-404

06/08/2022, 7:33 PM

Something like a better name for the argument? Or a more fundamental change?

bitter-ability-32190

06/08/2022, 7:33 PM

I'm not picky, I'll take what I can get 😛

hundreds-father-404

06/08/2022, 7:37 PM

My first thought was something more precise like

invalid_address_error_origin

or something, but it's more complicated than that because the same

description_of_origin

may be reused across multiple error messages Let's take

the option --paths-to

, which uses

Specs

codepath. Possible error locations: •

Specs

rules, e.g. using a file path that does not exist •

AddressInput

rules, that an address literal is malformed like using too many

../

in a file address •

Address

rules, that the address is well-formed but does not exist We want to use the same

description_of_origin="the option --paths-to"

in all those cases

witty-crayon-22786

06/08/2022, 7:42 PM

I still strongly advocate for
description_of_origin
though because we need the precision of individual Gets:

hm, i see what you mean. that would require something closer to @curved-television-6568’s

Get(.., context=..)

syntax.

hundreds-father-404

06/08/2022, 7:43 PM

Exactly. Or...

description_of_origin

😛

witty-crayon-22786

06/08/2022, 7:43 PM

except that the latter requires changes to APIs throughout the codebase, whereas the former would be built in

witty-crayon-22786

06/08/2022, 7:44 PM

i agree with the thrust of this thread: we should have a facility for a caller to enrich an error without needing to push down context in a type. exception handling is messy, but is the best analogy i think.

hundreds-father-404

06/08/2022, 7:44 PM

You'd still have to update all callers, no? It's important that all call sites are using these better error messages imo Another way I could have implemented this is to set

description_of_origin: str | None = None

, and have the old error messages + new ones. We could still do that! But my thinking was we always want high quality error messages

witty-crayon-22786

06/08/2022, 7:45 PM

You’d still have to update all callers, no? It’s important that all call sites are using these better error messages imo

only the caller: not the callee

witty-crayon-22786

06/08/2022, 7:46 PM

…oh. so maybe not exactly the same as @curved-television-6568’s idea. i was suggesting that the context would become part of your failure stack, not that it actually be pushed down to the callee

witty-crayon-22786

06/08/2022, 7:46 PM

similar to if you had caught an exception, enriched it, and re-thrown.

hundreds-father-404

06/08/2022, 7:47 PM

From an API perspective, I don't buy that going from

Address -> WrappedTarget

WrappedTargetRequest -> WrappedTarget

is worse. It is for sure disruptive. But after we fix everything, I think it's actually better 1) More consistent with how we usually name things 2) We can add additional enrichment when we need, without changing rule types

witty-crayon-22786

06/08/2022, 7:52 PM

to be clear: i’ll shipit changes for description_of_origin: this is just a discussion of how error enrichment should happen in general.

1) More consistent with how we usually name things

2) We can add additional enrichment when we need, without changing rule types

i do think that it is worse in the long run. the Request pattern is not something that we will need in the long run, because https://github.com/pantsbuild/pants/issues/7490 should make many of them unnecessary.

➕ 1

👍 2

witty-crayon-22786

06/08/2022, 7:52 PM

not to mention the fact that needing to opt out of memoization signals that this field is not actually callee specific data: rather, caller specific. and could stay on their stack.

hundreds-father-404

06/08/2022, 7:57 PM

i’ll shipit changes for description_of_origin

Cool. First two are ready to land, and unblock next two. I think I can land this all in time for the alpha release

curved-television-6568

06/09/2022, 5:47 AM

I'll second that I'm not against the current “description of origin” field; but wanted raise the question about how to go about supporting this kind of question generically for more situations.

curved-television-6568

06/09/2022, 5:50 AM

I’m starting to feel that the argument that we must be precise and keep track of the request made to not get mixed up with everything else should “just work”. There can't be ambiguities or we couldn't be sure we get the correct results for our queries.

Open in Slack

Previous Next