< fast nail 55400> i opened <https github com pantsbuild pan Pants #development

Join Slack

<@U0N6C2Q9F>: i opened <https://github.com/pantsbu...

# development

witty-crayon-22786

07/14/2022, 7:40 PM

@fast-nail-55400: i opened https://github.com/pantsbuild/pants/issues/16188, but there is annoying aspect to it:

witty-crayon-22786

07/14/2022, 7:40 PM

applying the timeout at the service level (https://github.com/pantsbuild/pants/blob/1ef5a17f2bd331cc255c0af69fae18f808853115/src/rust/engine/grpc_util/src/lib.rs#L55-L70) would apply it to all requests.

witty-crayon-22786

07/14/2022, 7:42 PM

that’s solveable with some fiddling. but there is also a question of the intent of the timeout: at least to me, it seems like it should only apply to network time, so that adjusting the concurrency limit doesn’t make you more or less likely to hit the timeout

witty-crayon-22786

07/14/2022, 7:44 PM

on the other hand, if we do want it to apply to the entire cache lookup, we’ll likely want to increase the timeout significantly, which makes it less meaningful

witty-crayon-22786

07/14/2022, 7:50 PM

having rubber ducked on this (sorry), i think that it is probably worth the effort to figure out how to apply the timeout 1) at the network level, 2) only to cache reads … as described on the ticket.

average-vr-56795

07/14/2022, 7:59 PM

IIRC a service is a really cheap to construct wrapped around a channel? So making new ones, even per-request, should be ~free?

witty-crayon-22786

07/14/2022, 8:00 PM

Yes, but they would need to share the concurrency limit.

witty-crayon-22786

07/14/2022, 8:01 PM

Which... I think that you can do by supplying the semaphore to use... ?

fast-nail-55400

07/14/2022, 8:02 PM

requests in Tonic can carry extra typed data as an “extension”

fast-nail-55400

07/14/2022, 8:02 PM

then implement a Tower layer which checks for the extension for this type of timeout and apply the timeout

fast-nail-55400

07/14/2022, 8:02 PM

(assuming Tower layers can be used in such a manner to apply timeouts)

fast-nail-55400

07/14/2022, 8:03 PM

then if code wants the network-level timeout to apply, it would add this extension to the request

witty-crayon-22786

07/14/2022, 8:04 PM

Yea, that seems like it would be pretty clean

fast-nail-55400

07/14/2022, 8:59 PM

if you haven’t started on this, I can work on it

witty-crayon-22786

07/14/2022, 9:00 PM

i haven’t started it. but yea, that would be helpful.

witty-crayon-22786

07/14/2022, 9:00 PM

i was leaning toward creating two services, one for reading and one for writing, and then having them share the

limit

layer

witty-crayon-22786

07/14/2022, 9:01 PM

but whatever you’d like to do is fine.

fast-nail-55400

07/14/2022, 9:01 PM

what’s the benefit of having two services?

witty-crayon-22786

07/14/2022, 9:01 PM

as Daniel said: they’re cheap. and it avoids the need to tag and interpret tags

witty-crayon-22786

07/14/2022, 9:02 PM

but yea, whatever ends up being easiest.

witty-crayon-22786

07/14/2022, 9:02 PM

thank you.

fast-nail-55400

07/14/2022, 9:02 PM

but the timeout would need to be below the concurrency limit in the stack, so I don’t see how two services changes that fact

witty-crayon-22786

07/14/2022, 9:03 PM

and then having them share the
limit
layer

^ … i.e., use a single semaphore. iirc, there was a facility for this.

fast-nail-55400

07/14/2022, 9:03 PM

ConcurrencyLimit::with_semaphore

fast-nail-55400

07/14/2022, 9:05 PM

ah and each would have a different layer stack?

fast-nail-55400

07/14/2022, 9:05 PM

namely, a timeout layer in one but not the other?

witty-crayon-22786

07/14/2022, 9:06 PM

yea. or they both do, but the default timeout for an unbounded stack would be very large until/unless we add configuration

fast-nail-55400

07/14/2022, 9:12 PM

tower has a

Timeout

layer so good starting point (since we want metrics which it doesn’t have hooks for)

fast-nail-55400

07/14/2022, 9:53 PM

@witty-crayon-22786: I wonder if the

retry_call

function in

grpc_util

actually makes the problem worse since it assumes that the timeout is only for network-level waiting and not the concurrency limit.

fast-nail-55400

07/14/2022, 9:54 PM

hmm maybe not, it only deals with exponential backoff for the sleep between retries, not a timeout on the actual call

fast-nail-55400

07/15/2022, 8:37 PM

https://github.com/pantsbuild/pants/pull/16196 is the draft so far. I don’t like the approach I took though in vendoring the Tower

TimeoutLayer

fast-nail-55400

07/15/2022, 8:38 PM

I may rewrite to have a custom layer with the sole purpose of detecting when a timeout error is emitted.

5 Views

Open in Slack

Previous Next