Transport error when Agent calls gRPC discover() to some Discovery Handlers
johnsonshih opened this issue · comments
Describe the bug
A clear and concise description of what the bug is.
Agent hit transport error when invoke discover() to a Discovery Handler (written in grpc-dotnet) using UDS endpoint type. Error message from Agent shows the error occurs at protocol level:
[2023-09-28T16:58:55Z TRACE agent::util::discovery_operator] get_stream - endpoint is Uds("/var/lib/akri/opcua-asset.sock")
[2023-09-28T16:58:55Z TRACE agent::util::discovery_operator] get_stream - connecting to external opcua-asset discovery handler over UDS
[2023-09-28T16:58:55Z ERROR agent::util::discovery_operator] get_stream - could not connect to DiscoveryHandler at endpoint Uds("/var/lib/akri/opcua-asset.sock") with error status: Unknown, message: "transport error", details: [], metadata: MetadataMap { headers: {} }
I did a quick test using the sample code from tonic grpc server/client over uds and grpc-dotnet server/client over uds and get the following result
dotnet Server - tonic Client: FAIL
dotnet Server - dotnet Client : OK
tonic Server - tonic Client: OK
tonic Server - dotnet Client: OK
grpc-dotnet sample code:
grpc-dotnet/examples/Transporter at master · grpc/grpc-dotnet (github.com)
tonic sample code:
tonic/examples/src/uds at master · hyperium/tonic (github.com)
error msg from tonic uds client:
Error: Status { code: Internal, message: "h2 protocol error: http2 error: stream error received: unspecific protocol error detected", source: Some(tonic::transport::Error(Transport, hyper::Error(Http2, Error { kind: Reset(StreamId(1), PROTOCOL_ERROR, Remote) }))) }
From the server side, the server is complaining about the host header is invalid.
race id "0HMUACLLJH87R:00000001": HTTP/2 stream error "PROTOCOL_ERROR". A Reset is being sent to the stream.
Microsoft.AspNetCore.Connections.ConnectionAbortedException: Invalid Host header: '[::]:50051'
The Agent uses uri [::]:50051 to connect to gRPC server on UDS channel, change the address to [::1]:50051 fixes this issue.
Output of kubectl get pods,akrii,akric -o wide
Kubernetes Version: [e.g. Native Kubernetes 1.19, MicroK8s 1.19, Minikube 1.19, K3s]
To Reproduce
Steps to reproduce the behavior:
- Create cluster using '...'
- Install Akri with the Helm command '...'
- '...'
Expected behavior
A clear and concise description of what you expected to happen.
Logs (please share snips of applicable logs)
- To get the logs of any pod, run
kubectl logs <pod name>
- To get the logs of a pod that has already terminated,
kubectl get logs <pod name> --previous
- If you believe that the problem is with the Kubelet, run
journalctl -u kubelet
orjournalctl -u snap.microk8s.daemon-kubelet
if you are using a MicroK8s cluster.
Additional context
Add any other context about the problem here.