Regarding the current work adding `description_of_...
# development
c
Regarding the current work adding
description_of_origin
data to various request types, I’m wondering if there’s not a better more generic approach to this? 🧵
Rather than augmenting each and every request type, I feel that this kind of information is desirable for all rules in face of errors. What is wrong and how did we end up here.
In order to make that work generically, I imagine there could be some feature where this information can be provided during rule execution that sets some context that may be inspected/reported in case of errors.
Either before a
Get
call, or as an additional argument..
(I’ll ponder some more on this…)
b
I actually am jiving with this train of thought 🤔
I like that
description_of_origin
gives the user rich context, but for the developer I always find it hard to reason about because it's really provided in the context of some other message. There has to be a way (and I'm such it's very delicate) where there isn't such tight coupling between the caller and callee
c
Indeed, I think that the
description_of_origin
shows what it is we want to achieve, only I see it as very disruptive and ad-hoc which makes it inhibitly (not a word according to my spell check? 😛 ) difficult pattern to apply generically for more (ideally all) scenarios.
1
cc @hundreds-father-404 (I’m sure you’ll find this thread any way, but hey… 😉 )
h
So the information must come from somewhere from a plugin author. We can't automatically infer the context something comes from, like "CLI arguments" vs "the option --path-to" I don't think there's any way to get around the human declaring the context? From there, it's only a question of the best way to factor how the human sets that information
1
c
Yep, agreed.
What I’m looking for I guess is if this is possible to get (err, provide is perhaps more accurate), out of band, in some way.
Take this for example:
[Eric] I think that we have to get rid of
Addresses -> UnexpandedTargets -> Targets
, and instead add
TargetsRequest
.
from https://github.com/pantsbuild/pants/issues/14468#issuecomment-1147977415 Say that at the call site for
await Get(Targets, Addresses(…))
we could do something like this instead:
await Get(Targets, Addresses(…), description_of_origin="…")
that would propagate through all the rules so if you end up in an error handler, you can include that origin description in the error message.
h
I think the challenge there is how in any given
@rule
, you might have multiple "origins" in play at the same time. For example,
./pants paths
has both
--paths-to
and
--paths-from
..then while hydrating the dependency graph, we have tons of
dependencies
fields in play So we need to keep the
description_of_origin
tightly coupled to each specific caller or
Get
Given that, what would be the benefit of
Get(Targets, Addresses(…), description_of_origin="…")
vs
Get(Targets, TargetsRequest([addresses], description_of_origin="..."))
. Afaict that's only a different typing
c
Perhaps this could be a form of “rule execution context” where any kwargs are preserved, and passed to rules when requested as inputs.
Copy code
@rule
async def my_rule(request: SomeRequestType, rule_context: RuleExecutionContext) -> ...:
  if bad_stuff:
    raise InvalidStuff(f"This was no good, because {rule_context.get('reasons', 'unknown')}.")
...
[…] you might have multiple “origins” in play at the same time.
yeah, that could prove my idea to be a bad one. (I’ll see if I get the chance to poc something up with this, time permitting, to see if that’s the case)
👍 1
Afaict that’s only a different typing
as I see it, it’s more than different typing, as you don’t need to change the structure of your request types, nor introduce new types.
w
fwiw: in the past, we’ve discussed supporting the catching of exceptions to allow for adding more context and then rethrowing.
Perhaps this could be a form of “rule execution context” where any kwargs are preserved, and passed to rules when requested as inputs.
this is effectively https://github.com/pantsbuild/pants/issues/7490 i think
👍 1
a third idea is essentially a “Try” syntax of some sort…
await Try
, which could fail and return a value
🤔 2
ok, fourth idea, which i think is my preference. lean in on engine stacktraces, even when
--no-print-stacktrace
. in particular, https://github.com/pantsbuild/pants/issues/10536 talks about adding EngineAware params verbatim to the end of the description of a
@rule
. if we did that, and then even when
--no-print-stracktrace
, we rendered the stack of rules with descriptions, i think that it would catch 95% of these. the primary risk is that anytime we use this data for some “new” (sortof) use case, we complicate what it means to add a description, set the level, etc. but i think that that just means that it is a very powerful mechanism
also: @curved-television-6568: thanks for raising this. i was willing to avoid talking about it in the interest of getting https://github.com/pantsbuild/pants/issues/14468 done, but it really does seem like a lot of contortion is necessary
❤️ 1
yea, i’m pretty certain that we should do that. sketched it out here: https://github.com/pantsbuild/pants/pull/15769
💯 1
h
Thanks @witty-crayon-22786, I think #15769 is a good change regardless. I still strongly advocate for
description_of_origin
though because we need the precision of individual Gets: •
--paths-from
vs
--paths-to
, whereas otherwise the most precise we have is "the
--paths
goal" •
archive(files=)
vs
archive(package=)
, whereas otherwise we just have "resolving dependencies" •
python_distribution(entry_points=)
vs
python_distribution(provides=setup_py(...))
python_test(runtime_package_dependencies)
I'm almost done with PRs to update all of Pants to do the right thing. I don't expect it to be too onerous for new callers to use
Get(WrappedTarget, WrappedTargetRequest(address, description_of_origin="the option --paths-to"))
vs
Get(WrappedTarget, Address, address)
? If anything, that's somewhat more predictable. Follows our X and XRequest convention
My most important concern is that, here, precision matters. A less strong concern is how hard it is in practice to get people to correctly set
EngineAware
metadata -- folks may not realize it should be set in a particular way for error messages to render exactly the right way. Whereas
description_of_origin
intentionally must be set
b
Not saying I have a solution, but perhaps it would be possible to make
description_of_origin
a bit more clientside-intuitive? I think that's the pain point here.
h
What do you mean?
b
I like that
description_of_origin
gives the user rich context, but for the developer I always find it hard to reason about because it's really provided in the context of some other message.
There has to be a way (and I'm sure it's very delicate) where there isn't such tight coupling between the caller and callee
👍 1
h
Something like a better name for the argument? Or a more fundamental change?
b
I'm not picky, I'll take what I can get 😛
h
My first thought was something more precise like
invalid_address_error_origin
or something, but it's more complicated than that because the same
description_of_origin
may be reused across multiple error messages Let's take
the option --paths-to
, which uses
Specs
codepath. Possible error locations: •
Specs
rules, e.g. using a file path that does not exist •
AddressInput
rules, that an address literal is malformed like using too many
../
in a file address •
Address
rules, that the address is well-formed but does not exist We want to use the same
description_of_origin="the option --paths-to"
in all those cases
w
I still strongly advocate for
description_of_origin
though because we need the precision of individual Gets:
hm, i see what you mean. that would require something closer to @curved-television-6568’s
Get(.., context=..)
syntax.
h
Exactly. Or...
description_of_origin
😛
w
except that the latter requires changes to APIs throughout the codebase, whereas the former would be built in
i agree with the thrust of this thread: we should have a facility for a caller to enrich an error without needing to push down context in a type. exception handling is messy, but is the best analogy i think.
h
You'd still have to update all callers, no? It's important that all call sites are using these better error messages imo Another way I could have implemented this is to set
description_of_origin: str | None = None
, and have the old error messages + new ones. We could still do that! But my thinking was we always want high quality error messages
w
You’d still have to update all callers, no? It’s important that all call sites are using these better error messages imo
only the caller: not the callee
…oh. so maybe not exactly the same as @curved-television-6568’s idea. i was suggesting that the context would become part of your failure stack, not that it actually be pushed down to the callee
similar to if you had caught an exception, enriched it, and re-thrown.
h
From an API perspective, I don't buy that going from
Address -> WrappedTarget
vs
WrappedTargetRequest -> WrappedTarget
is worse. It is for sure disruptive. But after we fix everything, I think it's actually better 1) More consistent with how we usually name things 2) We can add additional enrichment when we need, without changing rule types
w
to be clear: i’ll shipit changes for description_of_origin: this is just a discussion of how error enrichment should happen in general.
1) More consistent with how we usually name things
2) We can add additional enrichment when we need, without changing rule types
i do think that it is worse in the long run. the Request pattern is not something that we will need in the long run, because https://github.com/pantsbuild/pants/issues/7490 should make many of them unnecessary.
1
👍 2
not to mention the fact that needing to opt out of memoization signals that this field is not actually callee specific data: rather, caller specific. and could stay on their stack.
h
i’ll shipit changes for description_of_origin
Cool. First two are ready to land, and unblock next two. I think I can land this all in time for the alpha release
c
I'll second that I'm not against the current “description of origin” field; but wanted raise the question about how to go about supporting this kind of question generically for more situations.
I’m starting to feel that the argument that we must be precise and keep track of the request made to not get mixed up with everything else should “just work”. There can't be ambiguities or we couldn't be sure we get the correct results for our queries.