Hey, I'm trying to use the pants remote caching wi...
# general
b
Hey, I'm trying to use the pants remote caching with bazel-remote. I set up bazel-remote in our kubernetes cluster with an alb ingress in aws. Everytime I try to run pants know, I get the error "Failed to read from remote cache (8 occurrences so far): Unavailable: "error trying to connect: invalid certificate: UnknownIssuer"". I added our custom ca cert with "remote_ca_certs_path" to pants and if I try the connection to the alb with "openssl s_client" using this cert it works flawlessly. I could also out-rule the grpc service itself, because if I expose the service port locally (without ssl) and let pants run against the exposed local port it works just fine. So, I guess there is a problem somewhere in how pants establishes this ssl connection
p
Hi, it sounds exactly like the issue I had (and still have...) https://pantsbuild.slack.com/archives/C046T6T9U/p1687525899603529
I didn't move forward with that much, I think next step would be to figure out where pants sets up that ssl grpc channel and debug that, but I'm not sure if that's doable...
b
I just checked the pantsd.log. There is a bit more logging then on stdout, actually. I could see, that pants is somehow trying to connect correctly, but it also seems like it's trying to connect to the ip address instead of the dns hostname. I'm not sure how well that works with alb and if pants also sends the hostname in the request
12:16:23.79 [DEBUG] error: error trying to connect: invalid certificate: UnknownIssuer
12:16:23.79 [DEBUG] updating from discover
12:16:23.79 [DEBUG] updating from discover
12:16:23.79 [DEBUG] updating from discover
12:16:23.79 [DEBUG] connecting to <http://100.xxx.xxx.xxx:9092|100.xxx.xxx.xxx:9092>
12:16:23.79 [DEBUG] updating from discover
12:16:23.79 [DEBUG] updating from discover
12:16:23.79 [DEBUG] connected to <http://100.xxx.xxx.xxx:9092|100.xxx.xxx.xxx:9092>
12:16:23.79 [DEBUG] No cached session for DNSNameRef("<http://example.org|example.org>")
12:16:23.79 [DEBUG] Not resuming any session
12:16:23.79 [DEBUG] updating from discover
12:16:23.97 [DEBUG] updating from discover
12:16:23.97 [DEBUG] ALPN protocol is Some(b"h2")
12:16:23.97 [DEBUG] Using ciphersuite TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
12:16:23.97 [DEBUG] Server supports tickets
12:16:23.97 [DEBUG] updating from discover
12:16:23.97 [DEBUG] updating from discover
12:16:23.97 [DEBUG] updating from discover
12:16:23.98 [DEBUG] updating from discover
12:16:23.98 [DEBUG] ECDHE curve is ECParameters { curve_type: NamedCurve, named_group: secp256r1 }
12:16:23.98 [DEBUG] Server DNS name is DNSName("<http://example.org|example.org>")
12:16:23.98 [WARN] Sending fatal alert BadCertificate
12:16:23.98 [DEBUG] reconnect::poll_ready: hyper::Error(Connect, Custom { kind: InvalidData, error: WebPKIError(UnknownIssuer) })
12:16:23.98 [DEBUG] service.ready=true processing request
12:16:23.98 [DEBUG] error: error trying to connect: invalid certificate: UnknownIssuer
12:16:23.98 [DEBUG] buffer closing; waking pending tasks
12:16:23.98 [DEBUG] buffer closing; waking pending tasks
12:16:23.98 [DEBUG] buffer closing; waking pending tasks
p
running with
--log-show-rust-3rdparty -ldebug
can give you more output as well
b
Yea, did that, but those logs only show up in the pantsd.log file. Not on the cli
p
right; I can't help much here unfortunately 😕 Is there a chance that
openssl s_client
actually uses some certificate present on your machine from default location instead of using the same one that pants is pointed to with
remote_ca_certs_path
var?
b
No, if I'm not using the custom certificate openssl is also complaining about a self-signed certificate in the chain. Only if I explicitely set the ca file which pants is also using it does not complain
✅ 1
Did you managed to test the endpoint with grpcurl? I'm honestly far away from understanding how this whole grpc stuff works
p
I think I tried, but for grpcurl you need to have reflection enabled, and I don't have access to changing the setup of the proxy that we use to bazel cache
that's why I tried setting up bazel locally and building some helloworld project, instead of pants; that worked, so I assumed that certificates and the overall setup that my devops team did was ok...
but I have a slightly different issue, I just get a "transport error" like if the certificate was properly read, but it just doesn't match what the service expects (?)
I will have more time next week to get back to this...
b
Are you also using AWS?
p
yes
b
I always got a transport error if I tried to connect with http
AWS ALB requires the listener port to be HTTPS
p
hmm, I've used grpcs:// in the setup so I assume that's https, if I use wrong certificate on purpose I get auth error, so I think that part is ok
h
If you figure out how to get this set up, a short writeup would be great! We can publish it in the docs for future reference.
b
Sorry for the late reply. For me, I ditched the ingress completely and keeping the cache inside the cluster. So I'm talking to the grcp via the k8s exposed cluster.svc.local address which also completely omits using https at all. For my use-case this works, but it's not really nice and I couldn't figure out why the ingress didn't work
h
Thanks for the followup
This would be good to figure out at some point