cortexproject / cortex

A horizontally scalable, highly available, multi-tenant, long term Prometheus.

Home Page:https://cortexmetrics.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"context canceled" is Added as a Span Event on `cortex.ingester/QueryStream` Trace

kennytrytek-wf opened this issue · comments

Describe the bug
When a QueryStream operation ends due to the context being canceled, the error is added to the trace's span event.

To Reproduce
Steps to reproduce the behavior:

  1. Start Cortex with tracing enabled.
  2. Run a query that the distributor sends to more than one ingester.

Expected behavior
If an ingester context is canceled during the query (which I understand is normal operation of cortex?), then the operation results in an OK span status with no attached span event.

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: Helm

Additional Context

The span in question:

spanlog, ctx := spanlogger.New(stream.Context(), "QueryStream")
defer spanlog.Finish()

Similar issue from the past:
#1279

How it was fixed in WeaveWorks:
https://github.com/weaveworks/common/pull/148/files

An example of a failing trace. In this example, there were five parallel query streams, and the one that was canceled was the slowest.
Screenshot 2023-12-07 at 2 42 26 PM

And its span event:
Screenshot 2023-12-07 at 2 42 43 PM

The span /cortex.Ingester/QueryStream was instrumented automatically by gRPC tracing middleware I think. It is not

spanlog, ctx := spanlogger.New(stream.Context(), "QueryStream")
defer spanlog.Finish()
codepath.

We need to change gRPC middleware library behavior to ignore context canceled error. I am not sure if it is something we can do easily.

As a workaround, I added a transform processor to our Mimir OpenTelemetry collector that watches for this case and sets the span status to OK.

processors:
  transform/cortexquerycontextcanceledspanevent:
    error_mode: ignore
    trace_statements:
      - context: spanevent
        statements:
          - set(span.status.code, 1) where (span.name == "/cortex.Ingester/QueryStream" and attributes["message"] == "context canceled")