<@U0N6C2Q9F>: i opened <https://github.com/pantsbu...
# development
w
@fast-nail-55400: i opened https://github.com/pantsbuild/pants/issues/16188, but there is annoying aspect to it:
that’s solveable with some fiddling. but there is also a question of the intent of the timeout: at least to me, it seems like it should only apply to network time, so that adjusting the concurrency limit doesn’t make you more or less likely to hit the timeout
on the other hand, if we do want it to apply to the entire cache lookup, we’ll likely want to increase the timeout significantly, which makes it less meaningful
having rubber ducked on this (sorry), i think that it is probably worth the effort to figure out how to apply the timeout 1) at the network level, 2) only to cache reads … as described on the ticket.
a
IIRC a service is a really cheap to construct wrapped around a channel? So making new ones, even per-request, should be ~free?
w
Yes, but they would need to share the concurrency limit.
Which... I think that you can do by supplying the semaphore to use... ?
f
requests in Tonic can carry extra typed data as an “extension”
then implement a Tower layer which checks for the extension for this type of timeout and apply the timeout
(assuming Tower layers can be used in such a manner to apply timeouts)
then if code wants the network-level timeout to apply, it would add this extension to the request
w
Yea, that seems like it would be pretty clean
f
if you haven’t started on this, I can work on it
w
i haven’t started it. but yea, that would be helpful.
i was leaning toward creating two services, one for reading and one for writing, and then having them share the
limit
layer
but whatever you’d like to do is fine.
f
what’s the benefit of having two services?
w
as Daniel said: they’re cheap. and it avoids the need to tag and interpret tags
but yea, whatever ends up being easiest.
thank you.
f
but the timeout would need to be below the concurrency limit in the stack, so I don’t see how two services changes that fact
w
and then having them share the
limit
layer
^ … i.e., use a single semaphore. iirc, there was a facility for this.
f
Ah
ConcurrencyLimit::with_semaphore
ah and each would have a different layer stack?
namely, a timeout layer in one but not the other?
w
yea. or they both do, but the default timeout for an unbounded stack would be very large until/unless we add configuration
f
tower has a
Timeout
layer so good starting point (since we want metrics which it doesn’t have hooks for)
@witty-crayon-22786: I wonder if the
retry_call
function in
grpc_util
actually makes the problem worse since it assumes that the timeout is only for network-level waiting and not the concurrency limit.
hmm maybe not, it only deals with exponential backoff for the sleep between retries, not a timeout on the actual call
https://github.com/pantsbuild/pants/pull/16196 is the draft so far. I don’t like the approach I took though in vendoring the Tower
TimeoutLayer
.
I may rewrite to have a custom layer with the sole purpose of detecting when a timeout error is emitted.