Transport error when Agent calls gRPC discover() to some Discovery Handlers

Question

Transport error when Agent calls gRPC discover() to some Discovery Handlers

johnsonshih opened this issue a year ago · comments

Describe the bug
A clear and concise description of what the bug is.
Agent hit transport error when invoke discover() to a Discovery Handler (written in grpc-dotnet) using UDS endpoint type. Error message from Agent shows the error occurs at protocol level:
[2023-09-28T16:58:55Z TRACE agent::util::discovery_operator] get_stream - endpoint is Uds("/var/lib/akri/opcua-asset.sock")
[2023-09-28T16:58:55Z TRACE agent::util::discovery_operator] get_stream - connecting to external opcua-asset discovery handler over UDS
[2023-09-28T16:58:55Z ERROR agent::util::discovery_operator] get_stream - could not connect to DiscoveryHandler at endpoint Uds("/var/lib/akri/opcua-asset.sock") with error status: Unknown, message: "transport error", details: [], metadata: MetadataMap { headers: {} }

I did a quick test using the sample code from tonic grpc server/client over uds and grpc-dotnet server/client over uds and get the following result
dotnet Server - tonic Client: FAIL
dotnet Server - dotnet Client : OK
tonic Server - tonic Client: OK
tonic Server - dotnet Client: OK

grpc-dotnet sample code:
grpc-dotnet/examples/Transporter at master · grpc/grpc-dotnet (github.com)
tonic sample code:
tonic/examples/src/uds at master · hyperium/tonic (github.com)

error msg from tonic uds client:
Error: Status { code: Internal, message: "h2 protocol error: http2 error: stream error received: unspecific protocol error detected", source: Some(tonic::transport::Error(Transport, hyper::Error(Http2, Error { kind: Reset(StreamId(1), PROTOCOL_ERROR, Remote) }))) }

From the server side, the server is complaining about the host header is invalid.
race id "0HMUACLLJH87R:00000001": HTTP/2 stream error "PROTOCOL_ERROR". A Reset is being sent to the stream.
Microsoft.AspNetCore.Connections.ConnectionAbortedException: Invalid Host header: '[::]:50051'

The Agent uses uri [::]:50051 to connect to gRPC server on UDS channel, change the address to [::1]:50051 fixes this issue.

Output of kubectl get pods,akrii,akric -o wide

Kubernetes Version: [e.g. Native Kubernetes 1.19, MicroK8s 1.19, Minikube 1.19, K3s]

To Reproduce
Steps to reproduce the behavior:

Create cluster using '...'
Install Akri with the Helm command '...'
'...'

Expected behavior
A clear and concise description of what you expected to happen.

Logs (please share snips of applicable logs)

To get the logs of any pod, run kubectl logs <pod name>
To get the logs of a pod that has already terminated, kubectl get logs <pod name> --previous
If you believe that the problem is with the Kubelet, run journalctl -u kubelet or journalctl -u snap.microk8s.daemon-kubelet if you are using a MicroK8s cluster.

Additional context
Add any other context about the problem here.