I think there's some confusion among different pan...
# development
h
I think there's some confusion among different pants developers over how we ultimately want rule cacheability to work, that's manifested as different people submitting PRs that make incompatible assumptions about what we should be building towards
a
yes -- i personally don't see this as a problem though unless you feel like you're doing work that then gets thrown away
h
a
reading
h
@aloof-angle-91616 I don't necessarily have a problem with doing work that ultimately gets thrown away (I mean, it's better to avoid that if possible, but it's legitimate to write code one day that gets replaced the next day by a better abstraction)
a
yeah most people get frustrated about that though and i was trying to understand the concern -- i personally throw away more code than i'll ever write
h
it's more that, I don't think we have a clear vision of what we should be building towards, in terms of the model for how pants rules should handle caching and nondeterminism
a
also notable is https://github.com/pantsbuild/pants/pull/8706 <= converting engine stacktraces to normal python stacktraces -- which means you can do a `try`/`except` around an
await Get[...](...)
h
and different people (myself included) are building abstractions that solve parts of the problem, but that are ultimately incompatible
a
ultimately incompatible
what conflicts are you seeing?
h
so, off the top of my head
-we're not sure what the purpose of console_rules is. should they be "top level" rules that map to a goal, or "rules that are explicitly marked as not cacheable regardless of whether they are top-level or not". this question has some downstream consequences cf. https://github.com/pantsbuild/pants/pull/8931
-in general how do we want to handle rules that fail? that's basically what https://github.com/pantsbuild/pants/pull/8858 is about
a
i think i might be the only person injecting the "let's generalize `@console_rule`" into the current discussion, and it was only for the case of using the legacy (?)
Scm
class, so we don't need to muck around with that right now. should have made that more clear in my response on the ticket
that's what i like about #8706 it literally makes engine errors into normal exceptions
h
I’ll throw my two cents in here as well. I advocate adding something like an
@uncachable_rule
type that can be used by non-console-rules As argued in https://github.com/pantsbuild/pants/pull/8931#issuecomment-572758552, I find the name
@console_rule
to be very helpful and that it should continue to be mapped to top-level goals. Or, perhaps we should rename it to be a
goal_rule
?
h
-how do we want to handle rules that do broadly side-effecting things? this includes the existing EPR mechanism, the InteractiveRunner mechanism I built a while ago that we're starting to use for more things, this touches on what @witty-crayon-22786 wants to do with adding an
interactive
flag to EPR, and this is related to the HTTP requesting functionality that I'm trying to build right now
a
that's what i was trying to get at with my response in your PR asking if we could generalize
@console_rule
-- it sounds like the use of the
Scm
class is one of many things we'd like to formalize control over
h
I feel like a while ago we decided that "side-effecting" things should be forced to happen in noncacheable console_rules, which is what https://github.com/pantsbuild/pants/pull/8922 is about enforcing
but we're not currently sure what we want
@console_rule
to be
a
i don't see an issue with making it "`@console_rule` = uncacheable" just for now and then (for the case of the http intrinsic, for example) generalizing that as necessary
that approach might avoid having to block on a discussion of what
@console_rule
should be
h
my original plan for HTTP intrinsics (https://github.com/pantsbuild/pants/pull/8834) is something that I'm abandoning on the grounds that HTTP requests are inherently side effecting and should therefore be forced to happen in a
@console_rule
- but if we change the semantics of `@console_rule`s or get rid of them entirely, that re-opens that discussion about HTTP
a
HTTP requests are inherently side effecting
i personally disagree with that conclusion
h
basically I think we need some coordination around how we want cacheable and non-cacheable rules to work at a high level
since this is turning out to effect multiple pieces of work that multiple people are working on
a
what kind of coordination are you looking for?
i think it's a reasonable thing to ask for
h
@aloof-angle-91616 I think that one thing that might make sense is to create a design document that answers exactly how we will go about closing the issue https://github.com/pantsbuild/pants/issues/6598
since that issue has been outstanding for over a year and I think it touches at the heart of how cacheability and non-cacheability should work in pants rules
a
i don't like the idea of blocking on having everyone agree on a design in order for people to be able to make forward progress
it feels like there's a difference between what people would want as the perfect most blessed v2 API, vs the v2 API that allows everyone to continue building the features we are now building on it
h
and me, you, @hundreds-father-404 @witty-crayon-22786 and @happy-kitchen-89482 should all make sure that whatever solution to #6598 we come up with, solves: -the design of http requests in pants -local interactive subprocess execution like what we just implemented for test --debug -what the ultimate story for console_rules should be -what the ultimate story for handling non-deterministic failures in rules should be
a
there was a really strong disagreement as i recall for the design of http requests in pants and it seems like we might want to avoid having to agree on that in general
one way to scope this better might be: for a proposal from you/toolchain about the http request feature you want to have, describe what that feature would need from the #6598 discussion
then we can potentially decouple having to agree on (the exact representation of http requests) from (the guarantees about cacheability that that representation requires)
h
so what I personally had in mind for http was to create a new side-effecting type that will make HTTP requests in
console_rule
that's what I made https://github.com/pantsbuild/pants/pull/8922 as prework for
a
have you posted the diff using that for http requests yet, or is that the near future plan?
h
it's hard to create an exact representation of HTTP requests without making some sort of decision about how the engine will invoke them, and that decision can't be made without having some idea of the ultimate cacheability story
@aloof-angle-91616 it's in a branch I'm working on today
I can certainly finish up that work and make a PR
it basically would amount to having a new uncacheable type called
HttpRequester
that works like
Console
👍 1
or
Workspace
a
it's hard to create an exact representation of HTTP requests without making some sort of decision about how the engine will invoke them, and that decision can't be made without having some idea of cacheability
yes. i'm thinking that a way to structure this is: (1) in making the http PR, define the cacheability guarantees needed for the https requests impl (having this explicitly written out with reference to #6598 would be cool) (2) we can then agree "are these reasonable assumptions we are all ok agreeing with" for #8922 (3) later, separately, we can agree on the http PR
it basically would amount to having a new uncacheable type called
HttpRequester
that works like Console
love it, this is aligned with what i was thinking
h
but cf. https://github.com/pantsbuild/pants/issues/6598 , "Post #6880, and with a solid solution for this issue (that makes it very obvious what is cacheable and what isn't), I think that we should be able to remove
@console_rule
."
so maybe the real heart of disagreement here
is that I'm going down a path that involves making noncacheable
@console_rules
a permanent and significant part of the pants rule execution model
👍 1
while simultaneously there are possible plans to change how the rule execution model works in such a way that gets rid of
@console_rule
entirely
a
I'm going down a path that involves making noncacheable @console_rules a permanent and significant part of the pants rule execution model
hm, i'm not sure i understand how your PR does that
h
https://github.com/pantsbuild/pants/pull/8922 is all about adding code that formally checks that some types (Workspace, Console, InteractiveRunner, and eventually HttpRequester) only can be arguments to `@console_rule`s
and the commonality that these types all have is that they make a side effect of some kind
h
That sounds reasonable. FWIT, I’m not wedded to the name
@console_rule
. I only want some way to signify that the rule that hooks up to a Goal is special. For example, I want to keep the new lint to ensure that goal rules return a
Goal
. I’d be very happy if we added
@goal_rule
.
👍 1
h
and yes this assumes that all HTTP requests, even GET requests, are "side-effecting" as far as pants is concerned
a
and https://github.com/pantsbuild/pants/pull/8548 for example checks
@dataclass
params for validity -- i don't think checking args for validity is venturing too far outside the box yet
h
the exact name
@console_rule
is a sort of side-issue - I certainly have no problem renaming this or adding additional types of
@rule
👍 1
a
@goal_rule
is cool too
👍 1
h
but I think you'd agree that there's a set of interlocking issues here, where a decision about one depends on decisions made about others
w
Sorry, you caught me during a dentist appointment
👨‍⚕️ 1
Will catch up on this.
a
we still have the ability to change our public symbols in the v2 API without a deprecation period until we release pants 2, and i think we should take advantage of that
💯 1
h
legit thanks for looking @witty-crayon-22786
@aloof-angle-91616 absolutely
a
but I think you'd agree that there's a set of interlocking issues here, where a decision about one depends on decisions made about others
yes, and it's possible to structure this so that we don't have to depend upon a big decision before moving forward, and it's possible to also structure this so that the precise application of the cacheability guarantees we're looking for can be shown in their full and proper context -- this can make it easier to have the discussion about what we want those guarantees to be
i would really like to move towards a type of looser governance model so we can avoid blocking work waiting until people have enough free time to review things in depth
w
Design isn't necessarily about "one big decision": just exploring a space to have mapped it out and understood some alternatives
✍️ 1
a
yes, and i'm concerned that's how it's being presented here
w
Will respond after grabbing lunch.
so, to respond directly to https://github.com/pantsbuild/pants/pull/8922#issuecomment-572703444 ... yes, I agree that #6598 is relevant, and i'm pretty sure that we should implement it
given that, if you see a clean way to evolve from what is implemented in #8922 once #6598 is implemented, then maybe all is well.
h
well, the notion of an uncacheable type that #8922 implements would still be useful
so the clean way to evolve from it would be to have the code that checks whether an input type is cacheable or not, do something else
but even talking about this assumes that the notion of a cacheable vs noncacheable input type makes sense. I think it does, but I could see how we might find in the process of implementing #6598 that it no longer makes sense
w
right. i expect that implementing #6598 would mean really truly having a rust Graph Node have a
cacheable
method/property
h
don't we have this today?
@console_rule
is currently poking that boolean on one of the rust data structures pertaining to nodes?
let me check the code
w
that would inherently be a property of a
Node
... going from "type" to
Node
is a bit weirder.
@hundreds-breakfast-49010: the ticket explains that.
...but yes, sorry. the property sortof exists, but just isn't safe to use in many places
h
specificallythe
Node
trait has a method
cacheable(&self) -> bool
w
yep.
sorry. yes.
so, two things:
1) #6598 should not be crazy hard to implement.
h
so, the rust type that gets created from a
@rule
implements this trait method in terms of whether
cacheable
is set on a rule, which is controlled by
@console_rule
vs
@rule
w
2)
Node
level cacheability is likely how things will stay in the graph. and that means that if you're going to go with the "uncachable type" semantics, you need to decide how that plays into whether a
Node
is cacheable.
h
right, so it's possible that with #6598, we might decide that we don't actually want uncacheable type semantics
w
er, i don't think that's likely.
h
if we probably do want uncacheable type semantics, then we do want to merge https://github.com/pantsbuild/pants/pull/8922 now, even if we will eventually change what that check does in the process of implementing #6598
that PR is formalizing the notion of uncacheable type semantics
w
oh, sorry: i see what you're responding to: 2.
yes. i think that uncachable nodes are likely more correct than uncacheable types.
the "uncacheable type" started as a workaround for not having #6598... i called it a "cute hack" here https://github.com/pantsbuild/pants/pull/8917#discussion_r364023925
(or in the thing linked from that paragraph)
h
so, how do types that inherently do side-effecting things play into this? this is Console, Workspace, InteractiveRunner, and my forthcoming HttpRequester type?
right now, I'm thinking that I'll mark all of these as uncacheable types
w
using that linked thing as an example: the SCM itself is not what is not cacheable: rather, calling a method of that type that forks a process is uncachable
@hundreds-breakfast-49010: i think (?) that sideeffecting is one category of uncacheable things.
h
yes
I think we should be clear on what we mean by the word "side-effecting"
I'm viewing side-effecting as basically a synonym of something that shouldn't be cached
w
i don't think that sideeffecting things should be in the graph. i do think that we are forced to have non-sideeffecting but uncachable things in the graph, for practical reasons.
☝️ 1
h
Console is side-effecting because it needs to do things locally that the user can see immediately in their console
InteractiveRunner is side-effecting for the same reason
w
yep
h
HTTP is side-effecting becuase some HTTP requests (POSTs) have to be run every time pants runs in order to change the state of some external system
and we're not sure about when we can and cannot cache GET requests
w
@hundreds-breakfast-49010: i do not think that they are synonyms though. side-effecting has an meaning, even if uncachable doesn't
h
what's an example of something non-side-effecting but uncacheable?
w
anything read-only that inspects the world in a way that we can't invalidate the graph for
h
so that's GET requests
w
for example: polling the file system in a location that we can't "watch" for.
h
what's an example of having to do that?
w
yes, in theory GET requests.
@hundreds-breakfast-49010: asking git whether things have changed
ie, the SCM example.
or literally just poking a file that we're unable to use watchman for (#4999 might make that unnecessary, but)
h
okay, so I think the real important distinction is between cacheable and un-cacheable things, not beween side-effecting and non-side-effecting things
i.e. asking git whether things changed and making a POST request or running a local interactive process are all non-cacheable, and that's the difference that should matter to the engine
even if only some of those tasks are "side-effecting"
h
okay, so let's think about HTTP GET requests
a GET request is non-side-effecting, it's non-cacheable, and it can also fail
w
yes.
h
and those last two things mean that if we had a rule that requested a type to make a HTTP request happen, that rule is potentially tainted by nondeterministic failure of the request
and so is anything downstream from it
do we expect #6598 to handle this case correctly?
w
it would, actually... because the uncacheable rule would run every time it was requested if it was not already running
so whether or not it failed the first time, it would run again and have another chance to not fail.
h
okay, cool. so I think that having 6598 in place affects the design of HTTP functionality
if we don't have it, then we need to make a HttpRequester something like how Workspace currrently works. and if we do have it, we no longer have that constraint
w
i believe that i might have commented to that effect on your design doc...?
if i didn't, i should have!
h
yeah you did mention something about 6598 there
"The intrinsic being non-cacheable doesn't mean that the rules that depend on it would necessarily be non-cacheable. Since rules are supposed to be deterministic, using "the current time" in a rule would be a no-no."
w
h
so, there's a distinction between a
@rule
that requests a
HttpResponse
from the engine, and the engine intrinsic that provides the
HttpResponse
itself, right?
w
re: the intrinsic: yea. you'd have to mark both the intrinsic and a rule feeding it a timestamp uncacheable
precisely.
h
the rule can be cacheable/noncacheable independently from the intrinsic
w
yes.
h
and 6598 is all about how upstream cacheability affects downstream cacheability
w
to be clear, that doesn't necessarily mean that that timestamp-centric API is a good idea... seems challenging to do correctly.
h
yeah, that was just an example
the important thing is having some way to eventually have the engine acutally make a new GET request becuase the old one is stale - and re-run any rules that depend on that new GET request
I mean, even better would be ifthe engine could make a new GET request and not rerun things if it verified that the contents of the new GET request were the same as the contents of the old one
but all of this is not stuff the engine currently does. it might do it when 6598 is fixed, depending on exactly how we implement the fix for 6598
so I think I was getting hung up there
I would basically want to design this http intrinsic differently depending on exactly what new semantics we get from 6598
w
yes.
h
maybe the best thing to do would be to focus on getting that change in, and I'd be happy to work on this right now, although I think I'd need some guidance from you on how to go about doing it
w
if #6598 ends up implemented exactly as described on #6598, i have a pretty good idea of what the semantics would be.
h
I think I want to grab #6598 then and implement it and get it merged
and that might clear up some of the confusion I have about where we should be going in the engine design-wise
w
sounds good.
it might be a really simple change, as i think about it more
basically, in the graph: at some point while a
Node
is completing, it should see whether any of its dependencies are uncacheable, and if they are, it should go to the
Completed
state with
dirty: true
set
h
so this is in
src/entry.rs
somewhre probably
in
fn complete
on the
Node
trait it looks like
er, no, on the
Entry<N>
type
w
yea, possibly...?