https://pantsbuild.org/ logo
#development
Title
# development
h

hundreds-breakfast-49010

01/18/2020, 12:29 AM
any chance I could get some more eyes on https://github.com/pantsbuild/pants/pull/8973 ? particularly @witty-crayon-22786 or @average-vr-56795
w

witty-crayon-22786

01/18/2020, 1:23 AM
hypothetically, HttpRequester is not side-effecting...?
just potentially not cacheable?
h

hundreds-breakfast-49010

01/18/2020, 1:25 AM
HttpRequester
is definitely side-effecting if we make it the standard way to do all HTTP operations (including POSTs), which is what I had in mind with it
I only added GET functionality in this PR since it's what toolchain needs first, but, assuming we go with this design for making HTTP requests in general, we'd definitely want to add side-effecting functionality to it, ideally soon
maybe I should've documented that more clearly in the PR or docstrings
w

witty-crayon-22786

01/18/2020, 1:27 AM
are you envisioning that we would have a separate non-sideeffecting HTTP api that could be used deeper in the graph?
h

hundreds-breakfast-49010

01/18/2020, 1:29 AM
no, I'm envisioning that we wouldn't do this, all HTTP requests would happen through
HttpRequester
, other than the
UrlToFetch
abstraction we already have
I don't think we can build such a separate non-sideeffecting HTTP api safely right now anyway, since that depends on 6598
w

witty-crayon-22786

01/21/2020, 9:52 PM
...given that we are 100% certain that we need to implement #6598, does that change your answer?
h

hundreds-breakfast-49010

01/21/2020, 10:15 PM
I'm honestly not sure. part of me genuinely likes the idea that we keep the class of
@side_effecting
types, and keep the constraint that
@side_effecting
types can only be requested in a
@goal_rule
. this constraint forces rule implementors to "push out" operations that are conceptually side effecting to `@goal_rule`s. and I think that the other option, which is to just let ordinary
async Get[HttpResponse](HttpRequest, req)
calls work (while dirtying a bunch of the node graph, or sometimes dirtying a bunch of the node graph and sometimes not depending on the type and options of the
HttpRequest
) , makes it harder to think about when a rule will and will not be cached, which is undesirable
on the other hand, it would be convenient for rule authors to be able to just write ``async Get[HttpResponse](HttpRequest, req)` and the engine take care of invalidating only the things that need to be invalidated when the result of this HTTP request needs to change
do you agree that this tension between these two ways of implementing HTTP (and more broadly any type of side-effecting rule, including EPR if we implement the
foreground
flag) within pants exists?
w

witty-crayon-22786

01/21/2020, 10:21 PM
i agree that side-effecting should be at the root... that makes sense. the part that doesn't make sense is: not all HTTP requests are sideeffecting.
foreground is uncacheable, but it is still "sandboxed" to the same extent that all process invokes are
...so it is not "sideeffecting", any more than any process is.
h

hundreds-breakfast-49010

01/21/2020, 10:26 PM
I think we're operating from different perspectives of what side-effecting means
I've been thinking about side-effecting as meaning anything that causes pants to have to wait for external input at the time that the user runs a command - so, waiting on an HTTP requests counts, as does running a --debug process (becuase pants has to wait for the user to type in commands live into the terminal)
w

witty-crayon-22786

01/21/2020, 10:28 PM
i think that sideeffecting has a pretty distinct meaning: "making (untracked?) changes to the outside world". making a POST request to a web service, or deleting/editing a file in your repository (or anywhere on your system, for that matter) would be sideeffects
h

hundreds-breakfast-49010

01/21/2020, 10:28 PM
and I think you're calling a subset of those things "uncacheable" but not "side-effecting", and arguing that it's okay if uncacheable but not side-effecting effects happen in the engine, dirtying nodes in the way that 6598 will implement - and that only things that are truly side effecting from the perspective of affecting the outside world at runtime actually need to be pushed into `@goal_rule`s
w

witty-crayon-22786

01/21/2020, 10:29 PM
i don't think that user input in to a process is a sideeffect, any more than changing your CLI args is a sideeffect.
user input does mean that you would want to re-run if the inputs might have changed... and that requires it not to be cached, because we can't record the user inputs and replay them (and wouldn't want to if we could)
h

hundreds-breakfast-49010

01/21/2020, 10:32 PM
ah, and I've been thinking of user input to a process as being a side-effect, because in practice doing this requires guaranteeing that we grab the terminal
w

witty-crayon-22786

01/21/2020, 10:39 PM
another heuristic to think about sideeffecting things is "is it safe to retry automatically without any configuration?" (although this is partial: there are sideeffecting things that are safe to retry)
we should absolutely, 100% automatically retry processes and UrlToFetch requests automatically
h

hundreds-breakfast-49010

01/21/2020, 10:41 PM
I don't actually think this is safe to do with ordinary HTTP GET requests, because of timeouts
if a server is down, we don't want pants to sit there spinning making requests at it that will never succeed
so in this sense, having to deal with timeouts "re-introduces" side-effecting semantics to HTTP GET, even though GETs are, on their own terms, supposed to be idempotent
w

witty-crayon-22786

01/21/2020, 10:42 PM
i'm not suggesting infinite retry.
retrying and failure do not introduce "side-effects". not by the definition above.
being non-deterministic is not a side-effect.
e

enough-analyst-54434

01/21/2020, 11:01 PM
I'm having trouble tracking all this. To put a finer point on things, how is HttpRequester different from InteractiveProcessRunner? Is there a specific difference that causes pause with the HttpRequester PR or is it rather a stop-the-bleeding concern, ie: we shouldn't have been doing it this way for InteractiveProcessRunner either. If the latter, does our repeated quoting of "private APIs still" underwrite moving forward?
h

hundreds-breakfast-49010

01/21/2020, 11:07 PM
@enough-analyst-54434 I don't think it is different. I mentioned on https://github.com/pantsbuild/pants/pull/8974#issuecomment-574924016 that I think if we go with stu's conception of what is and is not side-effecting, we probably don't want IPR as a distinct thing from EPR
w

witty-crayon-22786

01/21/2020, 11:08 PM
HttpRequester and IPR are similar. in the IPR case we have an alternative API that works for the
test --debug
case, but not for the
run
case. similarly, in the HttpRequester case, there is likely a non-sideffecting API that could be used deeper in the graph to make GET requests.
e

enough-analyst-54434

01/21/2020, 11:09 PM
That suggests moving forward then with HttpRequester and circling back later if unification / better ideas eventually come along.
w

witty-crayon-22786

01/21/2020, 11:13 PM
maybe. but there are also alternatives that have been discussed a few times, including writing whatever logic is hitting the network as a process
for example: would i currently suggest running PEX in process? no: i would suggest subprocessing it
depending what kind of logic we're trying to write, the answer might be "it should be in another process".
if this is blocking something important for you, and doing this in process is truly the right answer, then i won't block. i just don't have perspective on the backstory and motivations.
e

enough-analyst-54434

01/21/2020, 11:20 PM
depending what kind of logic we're trying to write, the answer might be "it should be in another process".
(I don't understand slack) ... It feels like we've been unable to pin this down for several months or more and this seems like a basic thing (foundational) that we need to have ready answers for that don't require consulting oracles. I'm guessing everyone would like to have this nailed down but it requires more time than anyone is willing to invest.
w

witty-crayon-22786

01/21/2020, 11:25 PM
basic and foundational are two very different things.
i'm not blocking here. rather than an oracle, you should consider me to be someone advising caution in fundamental apis
e

enough-analyst-54434

01/21/2020, 11:26 PM
They are not. To a rule write basic may happen to be foundational. They don't really care about the distinction.
w

witty-crayon-22786

01/21/2020, 11:26 PM
the foundation decides what should be basic or not basic... easy vs hard vs impossible
writing a complete network aware system in the engine is an existential question: is that what it is for, or not?
e

enough-analyst-54434

01/21/2020, 11:28 PM
The oracle bit is what it feels like though, I think the reason for that is we don't have a high enough ratio of folks all understanding the same thing. We need a higher ratio to make fluid forward progress. I'm not sure how to get there.
The engine conversations have often felt painful for a good while.
w

witty-crayon-22786

01/21/2020, 11:29 PM
@enough-analyst-54434: i would say that some of the reason this feels oracle like is that i have no idea what the actual usecase is, and need to speak in generalizations. this feels like an example of the kind of trap twitter frequently gets in where we attempt to make a change without motivating it, and you push back wanting to know the usecase
if it would help to meet and discuss the usecase privately, i'm happy to.
e

enough-analyst-54434

01/21/2020, 11:30 PM
Makes sense. I'm not exactly sure of the usecase - I'm away too much, but fwict its calling external HTTP APIs.
I would guess that would be a pretty routine thing for a rule author to need to do? Could be wrong.
w

witty-crayon-22786

01/21/2020, 11:32 PM
the only usecase i know of (literally! not because i doubt that you have one!) is to implement a resolver
and we have moved in the direction of having that out of process, because it is language specific
e

enough-analyst-54434

01/21/2020, 11:32 PM
I can imagine Medium wanting to post info to datadog. They did that sort of thing back when I worked with them,
w

witty-crayon-22786

01/21/2020, 11:32 PM
(ivy, coursier, pex, yarn, go fetch, etc)
e

enough-analyst-54434

01/21/2020, 11:32 PM
That;s the 1st off the top.
w

witty-crayon-22786

01/21/2020, 11:33 PM
https://pantsbuild.slack.com/archives/C0D7TNJHL/p1579649573065700?thread_ts=1579307375.032400&cid=C0D7TNJHL metrics gathering feels inherent to the graph (similar to the workunit stuff) rather than something that you'd put in a rule....
but rather than hypotheticals: @hundreds-breakfast-49010: would it be useful for you to write privately somewhere about what the usecase is...? or for us to meet?
e

enough-analyst-54434

01/21/2020, 11:34 PM
And I really don't think this is private is it @hundreds-breakfast-49010?
Thinking broadly though, toolchain will clearly provide some for-pay Pants functionality, no doubt some that form part of the rule graph. To do so, we'll need to talk to an API to perform some proprietary work. To do so, we'd currently need to write a small CLI to wrap any api call. That would clearly work, it just feels a bit unfriendly imagining a world where various companies provided various for-pay rules. A first class API like EPR seems a bit more friendly. The extreme here is Pants writes a CLI for requests and packages it up as a cross-platform pex. We then have a set fof types and rules that expose what looks like an API to make HTTP requests but actually does an UrlFetch of that generaic HTTP client pex and EPRs it.
Maybe that makes sense, but it seems a bit baroque.
h

hundreds-breakfast-49010

01/22/2020, 1:51 AM
ah sorry missed this convo, my slack page seems to have lost its internet connection without me noticing it
but yeah what @enough-analyst-54434 said is what toolchain wants, and it's not private at that level of abstraction
I'm trying to get a first-class, generic, HTTP API merged into pants, starting with GET requests, so that toolchain can write pants rules that make requests against toolchain's API and perform different business logic depending on the results of the request
writing an entirely separate script that makes an HTTP request, and running it as an EPR is awkward; and also we don't currently have a way to tell an EPR to stop paying attention to its cached result after a timeout and run again
w

witty-crayon-22786

01/22/2020, 4:37 PM
https://pantsbuild.slack.com/archives/C0D7TNJHL/p1579658717067900?thread_ts=1579307375.032400&cid=C0D7TNJHL this is evident from what we have discussed, but doesn't do much to clarify. i really think we need to talk about the concrete usecase.
e

enough-analyst-54434

01/22/2020, 5:02 PM
https://pantsbuild.slack.com/archives/C0D7TNJHL/p1579658717067900?thread_ts=1579307375.032400&cid=C0D7TNJHL is pretty concrete. Let's say the API takes a list of python requirements and produces a list of distribution urls. The API is protected by authentication. We want to write a rule that uses the API. I can't imagine other relevant details.
a

average-vr-56795

01/22/2020, 5:04 PM
Do we expect deterministic results from that API?
(Assuming good auth)
e

enough-analyst-54434

01/22/2020, 5:04 PM
Yes
a

average-vr-56795

01/22/2020, 5:08 PM
So for that use-case: • No observable side effects (but hitting the cache may save some money) • Deterministic output • No need for special invalidation of dependees • Auth should probably be a special concern where it’s a param to the rule, but doesn’t form part of the caching identity (i.e. multiple auth tokens may be considered equivalent in terms of getting cache hits, but should be considered distinct in terms of if we got an auth failure, and retry with a new auth token, we actually perform the request • We may need to integrate with some kind of non-deterministic auth subsystem to provide that param - ideally we don’t need to re-auth every build, but also ideally the graph would have functionality like “If I see an auth failure, I’ll re-auth and retry the rule” (the last two are very similar to a concern we have for remote execution auth (and punted on))
e

enough-analyst-54434

01/22/2020, 5:12 PM
And I'd argue that use case is likely basic to dredge up contentious framing. If it's difficult or impossible to implement we have a problem to solve.
a

average-vr-56795

01/22/2020, 5:14 PM
I like the use-case. I’m a little scared that there probably are side-effecting use-cases. But if we’re happy to build a solution which doesn’t solve them, that works. For that case, it sounds like the only thing we’re missing is the way to inject an auth token, and specialising the caching to handle that? Would we be happy specialising the problem specifically to oauth, and providing an oauth intrinsic which the engine treats specially?
w

witty-crayon-22786

01/22/2020, 5:30 PM
if daniel's guesses about the usecase are correct, it doesn't seem like the API should be marked sideeffecting, and should thus be usable deeper in the graph
(which would make it much more useful, and which is where i thought Greg's design doc was headed: https://docs.google.com/document/d/1CdlGMXWxdwpGqet7-k6DFIbg90IqR75cFg8SkRu06H8/edit )
a

average-vr-56795

01/22/2020, 5:38 PM
(But the auth thing is side-effecting, in a really specific weird way, which is why it would be nice to special case that if possible… If we want to generally solve that kind of problem, we do need to solve the whole big scary one)
(Effectively the auth node would be the scary one, not the http node)
w

witty-crayon-22786

01/22/2020, 5:39 PM
yea, i think that the auth thing was raised in the context of https://github.com/pantsbuild/pants/issues/8770 ... (see the last comment)? but i feel like auth/caches of that sort should be an intrinsic
@hundreds-breakfast-49010: if the above is true, it's exciting, because the intrinsic can go deeper in the graph, right?
h

hundreds-breakfast-49010

01/23/2020, 9:12 PM
what do you mean by "the above"?
I like the use-case. I’m a little scared that there probably are side-effecting use-cases. But if we’re happy to build a solution which doesn’t solve them, that works.
The side-effecting type
HttpRequester
does solve side-effecting http use cases automatically
w

witty-crayon-22786

01/23/2020, 9:17 PM
the last 5 or 6 comments from daniel and i
h

hundreds-breakfast-49010

01/23/2020, 9:19 PM
I'm not sure if it makes sense to assume oauth or any other type of http authenticiation into a pants intrinsic
I'm not actually particularly knowledgable about the details of oauth (or any other auth system), so I'm not sure what it would entail to do so
I think @average-vr-56795’s argument is that, if we had the engine automatically take care of authentication, we'd be able to get non-side-effecting semantics out of GET again and therefore let a GET request intrinsic be okay within the cacheable part of the rule graph
this is basically a question of whether we want pants to assume some details of how auth is going to work for any client of pants, or have pants give rule-authors generic tools to handle their own auth, and I lean towards the latter
w

witty-crayon-22786

01/23/2020, 9:24 PM
not the oauth+http part, i think: just the ability to create a tiny persistent cache somehow.
i think that the auth/persistent-cache part is somewhat orthogonal (...well, it's related. but building in support for HTTP auth would definitely jive with intrinsic usage of http.)
the larger question is whether you agree that your concrete usecase does not require sideeffects in http
ie: auth requiring sideeffects is different.
h

hundreds-breakfast-49010

01/23/2020, 9:28 PM
within http our concrete usecase doesn't need HTTP sideeffects
so if we're treating auth sideeffects and also timeout/retry sideeffects as different, then yes a GET http request doesn't side-effect
w

witty-crayon-22786

01/23/2020, 9:29 PM
timeout/retry are not sideffects
they're non-determinism.
so yea, i think that that means you're good moving the API deeper.
which, again: i think is really exciting.
h

hundreds-breakfast-49010

01/23/2020, 9:31 PM
I'm good moving the GET API deeper, closer to what I had in mind with https://docs.google.com/document/d/1CdlGMXWxdwpGqet7-k6DFIbg90IqR75cFg8SkRu06H8/edit , but that means we have to first solve the problems around non-determinism deep within the rule graph
which is something we want to do anyway
w

witty-crayon-22786

01/23/2020, 9:31 PM
we're forced to address the split with IPR and foreground, because some concrete usecases need sideeffects there. but not here with http
yep.
i drafted 6598 on the plane back from oneteam, but haven't tested it yet.
👍 1
h

hundreds-breakfast-49010

01/23/2020, 9:34 PM
okay, so once that's merged, we'll be able to write a HTTP get intrinsic that will handle re-auth/timeouts/HTTP failures in a way that rules don't have direct access to
w

witty-crayon-22786

01/23/2020, 9:35 PM
yep!
i'll take a swing either tonight or this weekend.
h

hundreds-breakfast-49010

01/23/2020, 9:35 PM
that intrinsic will be marked as
uncacheable
as just a property of that specific intrinsic (other intrinsics are still
cacheable
)
w

witty-crayon-22786

01/23/2020, 9:36 PM
yep.
with ... some sort of semantics.
the implementation details of intrinsics are very private, and very malleable.
so replacing the
Node::cacheable
flag with something more nuanced is 100% fine.
h

hundreds-breakfast-49010

01/23/2020, 9:37 PM
and it will have to return some type that's generic over success/failure, like
FallibleHttpResponse
- becuase what non-determinism means is that, with the exact same
HttpGetRequest
type fed to the engine, the intrinsic might return a success or failure, depending on all the details of timeouts, etc. that the intrinsic will take care of
w

witty-crayon-22786

01/23/2020, 9:38 PM
i agree with Daniel's comment on the design doc about that aspect.
h

hundreds-breakfast-49010

01/23/2020, 9:39 PM
which comment is that?
h

hundreds-breakfast-49010

01/23/2020, 9:40 PM
ah yeah
w

witty-crayon-22786

01/23/2020, 9:40 PM
basically: user code should generally not need to deal with an HTTP request failing... although 404 might be a weird special case in this regard.
h

hundreds-breakfast-49010

01/23/2020, 9:41 PM
that's something the intrinsic will have to decide
w

witty-crayon-22786

01/23/2020, 9:41 PM
i don't think so
h

hundreds-breakfast-49010

01/23/2020, 9:41 PM
of course it could decide this based on options in the
HttpGetRequest
type it gets fed
w

witty-crayon-22786

01/23/2020, 9:41 PM
...oh. i see what you mean. with regard to whether it retries?
yea, i suppose. that's clear-ish from HTTP semantics though.
(client errors vs server errors)
h

hundreds-breakfast-49010

01/23/2020, 9:42 PM
yeah I can see the argument that you shouldn't retry a 4xx error based on http's semantics
the last thing that I'm worried about is building assumptions about auth into the intrinsic's internal retry logic
w

witty-crayon-22786

01/23/2020, 9:45 PM
as Daniel mentioned above, adapting UrlToFetch might be an option here (vs having a separate intrinsic). but i don't have a preference either way.
h

hundreds-breakfast-49010

01/23/2020, 9:45 PM
like I said I don't understand all the ways you can auth a http request super well
w

witty-crayon-22786

01/23/2020, 9:46 PM
me neither. but that would have needed to be figured out either way.
h

hundreds-breakfast-49010

01/23/2020, 9:47 PM
UrlToFetch
is implemented with an intrinsic today
so the question is, is there some form of http auth such that we're going to regret leaving it to an intrinsic to handle
I know that there are some systems where you make a request to an auth URL, get an auth token that's valid for say 1 hour, and then you craft your subsequent requests to an API with that token
how would a rule author write a rule that could cope with this?
w

witty-crayon-22786

01/23/2020, 10:09 PM
see above re: the cache intrinsic.
in short, have a tiny intrinsic that hits a persistent cache (with a configurable ttl if necessary) in the Store.
h

hundreds-breakfast-49010

01/23/2020, 10:12 PM
what's the API you're conceiving of this intrinsic using?
w

witty-crayon-22786

01/23/2020, 10:15 PM
hm? something like
await Get[OptionalCacheEntry](CacheKey(..))
h

hundreds-breakfast-49010

01/23/2020, 10:43 PM
https://gist.github.com/gshuflin/6d7778bad0a4feb1690f34afb7ad5268 <- I'm just sketching out ideas of what it might feel like to actually use the intrinsic API in practice