spire-agent health check report spire_agent_rpc_workload_api_fetch_x509_bundles{status="PermissionDenied"} metrics

Question

spire-agent health check report spire_agent_rpc_workload_api_fetch_x509_bundles{status="PermissionDenied"} metrics

penghuazhou opened this issue 4 months ago · comments

pire-agent health check invoke this method: sendX509BundlesResponse, because spire-agent do not have a spire entry that select spire-agent pod, it will return codes.PermissionDenied. I will report this metrics: spire_agent_rpc_workload_api_fetch_x509_bundles{status="PermissionDenied"}, so when this exception occurs in a normal request for non health checks, I cannot distinguish between them.

func sendX509BundlesResponse(update *cache.WorkloadUpdate, stream workload.SpiffeWorkloadAPI_FetchX509BundlesServer, log logrus.FieldLogger, allowUnauthenticatedVerifiers bool, previousResponse *workload.X509BundlesResponse, quietLogging bool) (*workload.X509BundlesResponse, error) {
    if !allowUnauthenticatedVerifiers && !update.HasIdentity() {
       if !quietLogging {
          log.WithField(telemetry.Registered, false).Error("No identity issued")
       }
       return nil, status.Error(codes.PermissionDenied, "no identity issued")
    }
    resp, err := composeX509BundlesResponse(update)
    if err != nil {
       log.WithError(err).Error("Could not serialize X509 bundle response")
       return nil, status.Errorf(codes.Unavailable, "could not serialize response: %v", err)
    }
    if proto.Equal(resp, previousResponse) {
       return previousResponse, nil
    }

Version: v1.9.6
Platform: linux-amd64
Subsystem: spire-agent

Agustín Martínez Fayó · Answer 1 · Wed Aug 07 2024 02:17:09 GMT+0800 (China Standard Time)

Thank you for opening this issue, @penghuazhou.
I think that the health check shouldn't interfere with the metrics. We already filter the logs when it's the PID of the agent itself the one that made the request. I think that we should do the same for the metrics.
I believe that we will need to handle this at the middleware level, preventing to emit metrics due to requests from the agent itself.