mostly unrelated to the above. <@UKPS08ZGW>: re <h...
# development
w
mostly unrelated to the above. @hundreds-breakfast-49010: re https://github.com/gshuflin/pants/commit/a570414cbdbddfd995f50d2aa90df7ea6050d72a and exceptions:
i expect that one of the challenges will be that there are two cases: 1) if a Node has
waiters
while it is running, when it fails it's pretty easy to send the failure to them directly, 2) but for requests after having failed, you cannot have "Complete"d the
Node
, otherwise it won't run again
so i think that what you were doing in https://github.com/gshuflin/pants/commit/87eea13575c1791495dc07d804b7926e30665c61 is closer to what you need
h
what exactly is the
waiters
list meant to represent?
w
basically: if a
Node
fails, it cannot "Complete": it must go back to
NotStarted
your second edit tries to do everything in
get
, but for anyone trying to
get
the Node later, you need to start running again, and the right way to do that is probably just to transition right to
NotStarted
rather than Completing
@hundreds-breakfast-49010: look at the docstring on the struct (EDIT: sorry, nothing useful there. see below.)
(it's roughly: other Nodes/Futures that are waiting for this Node to complete)
h
that makes sense
the issue I ran into with that earlier commit
w
@hundreds-breakfast-49010: i would say that probably the clearest way to guide this refactoring would be to change the type stored in
Completed
and
EntryResult
from a
Result<Item, Error>
to just
Item
because the goal is approximately: "Nodes never store a failure"
and so the typesystem can help here.
or maybe just changing the thing inside
Completed
to an
Item
...?
one of those.
h
was that if I just dropped down to that
NotStarted
case when a rule threw an exception, I would see that deliberate exception I put in the
create_binary_rule
run about eight times, which seemed excessive
but now that i think about it, I suspect that's what we want
since I think we expect that most exceptions that rules throw in practice will be nondeterministic
w
yea, unfortunately: that's what we're asking for.
h
so, if, during the graph traversal process, we get a node that throws, we set it to
NotComplete
w
i continue to be uncomfortable doing this, to be clear. avoiding non-determinism would be a better goal. cc @happy-kitchen-89482
h
and then anything upstream of it that needs it will just request that node, since it's
NotComplete
and it's only in this artificial case where I'm deliberately always throwing that I see the rule run and the exception thrown multiple times
w
but if we're certain we need to, then that would be the way.
h
@witty-crayon-22786 benjy's argument is that we can't guarantee that users will be perfect about writing rules that don't throw, and we don't want to poison the cache with the failed results of these rules
anyway, if, hypothetically, we were able to enforce that every python
@rule
was completely deterministic, I think it would still be fine to change the type in the way you describe
w
we can by making it very hard to do the wrong thing
h
because in that hypothetical scenario we would either statically enforce that a rule always returns something useful, or always fails, which would be useless
w
caching exceptions makes it hard to do the wrong thing
h
"caching exceptions makes it hard to do the wrong thing" <- I'm not sure I understand the logic behind this, and I think that's becuase I'm not clear on what does and does not constitute "the wrong thing"
w
the reason you care about caching exceptions at all is that you are trying to do something non-deterministic in a rule, correct?
if you weren't trying to do something non-deterministic, it would be a tree falling in a forest.
h
yeah, I think most of these cases are in having rules that make network requests, which we don't yet have an intrinsic for
w
for precisely this reason.
h
but even if we do implement that intrinsic, there will be other things we're not thinking about at the moment, that prove to be nondeterministic
or maybe things we think are deterministic but turn out not to be rarely
w
the thing about network requests is that in most cases they are non-deterministic even when they succeed
h
yeah, we definitely do want an intrinsic for network requests, we just don't have one yet (I was starting to work on one and maybe should jump back to it in the very near future)
w
it's all tied up in the same question, so doing a bit of design for the entire question/problem ("we'd like a to run a potentially non-determinstic resolve codepath") that would be good
because i think that i've suggested breaking your resolver out into a pex/separate process, and i think that i would still recommend that.
because non-determinism of processes is a pattern we already expect to need to account for (due to flaky tests, resolver network access, etc)
h
it sounds like you're suggesting that anything that pants does that might be legitimately nondeterministic should be put into a separate process so it can be run with ExecuteProcessRequest, which already handles process nondeterminism?
w
that's roughly where we are right now, yes.
none of the built in rules are non-deterministic: only those intrinsics (ExecuteProcessRequest and UrlToFetch)
this also connects to the retry question.
h
if I remember correctly we had a draft design for the retry question and then were able to work around it for tests so it got tabled
but yeah the broader question is definitely tied into the same notion of correctly handling nondeterminism
benjy's going to say that no matter what we do, we can't perfectly enforce that no user ever creates a non-deterministic
@rule
(although as you say we can make it harder to do the wrong thing)
w
we can with the right sandboxing. certainly if we banned/whitelisted stdlib imports, it would go a long way.
(bazel went so far as to use starlark, which was likely "too much" sandboxing... or at least the wrong compatibility tradeoff)
h
right now there's nothing stopping a rule-writer from importing arbitrary python into their rules, right?
like, if someone writes a plugin that provides some rules, they can import their own library, right?
and use that library code in their rules?
if we allow that then there's no way we can in principle guarantee that no exception will be thrown in a rule
I think even if we strictly whitelist std library imports that will be really hard to get right
w
sure. but whitelisting certain stdlib imports is definitely a thing that we should discuss doing.
h
I understand your discomfort with this (and I will read this thread in detail for more context). I agree that the topline has to be "rules must be deterministic", and I'm confident we can make our rules adhere to that. I'm just concerned about balancing that out with a concession to human error, especially with in-repo custom rules at various orgs.
w
@happy-kitchen-89482: what about leaning in on import whitelisting instead as a way to solve the same problem?
the idea would be to guide folks toward using external processes to isolate non-determinism
h
we can build abstractions within pants to isolate non-determinism other than ExecutePorcessRequest
that's what the network request intrinsic would be (or any other similar intrinsics we wrote)
w
Sortof: it depends whether the network request intrinsic would need to expose failure to @rules
h
hm, actually, right now requesting an EPR can fail, right?
and that failure is manifested as a python exception, right?
yeah we have that
ProcessExecutionFailure
exception
just like any other python exception in a rule, so a client of this rule wouldn't even be able to manually catch that exception, you just have to be aware of the fact that if you request an
ExecuteProcessResult
from a
FallibleExecuteProcessResult
it might fail
so maybe this is something we should get rid of from the pants codebase, but this is rule-nondeterminism we already have that we put there
w
yes. But the question is whether we want to encourage more and further expose it to users, or to isolate it to the process usecase and see if that makes things simpler
h
Just to clarify to myself: It should always be correct to not memoize something, right? It might impact performance but never correctness.
w
correct
h
@witty-crayon-22786 one thing @happy-kitchen-89482 suggested was that, maybe we want failure-memoization to be something toggleable on a per-rule basis
i.e. we would default to our current behavior, of caching rust
Err
values representing an exception-throwing
@rule
, but if we had a rule we knew to be flaky (so, assuming that most of the time most `@rule`s are deterministic), we could add an argument marking the rule as such
h
I think this would really help while developing new rules in user repos, while keeping all core pants rules, and most others, stricter.
h
@witty-crayon-22786 re-pinging since it was before thanksgiving since the last time this thread was active - do you think an option on `@rule`s to control whether or not thrown exceptions get memoized is a reasonable way to handle flaky rules?
w
That's one potential UX for this case, but I don't think it makes the problem simpler.
Might make it more complex.
I think that I'd personally rather just remove memoization of exceptions entirely, and do something else to discourage non-determinism (some sort of basic symbol banning/inclusion/exclusion list, if we can think of a way to do that)