Disconnected Traces between M3Query and M3DBNode

Question

Disconnected Traces between M3Query and M3DBNode

albertteoh opened this issue 4 years ago · comments

We came across a situation where a single request to an M3 endpoint resulted in two traces being created, when we expected a single trace. An example of traces resulting from a request to the query_range endpoint looks like the following and I've circled what I believe should be the continuation points of each trace:

I suspect the broken trace is caused by trace context being lost due to it not being passed down, and the creation of a new Context within an async operation on this line of code: https://github.com/m3db/m3/blob/master/src/dbnode/client/host_queue.go#L879

What service is experiencing the issue? (M3Coordinator, M3DB, M3Aggregator, etc)

This example is specific to M3Query -> M3DB, but could apply to other services.

What is the configuration of the service? Please include any YAML files, as well as namespace / placement configuration (with any sensitive information anonymized if necessary).

m3query.yml

listenAddress: 0.0.0.0:7202

tracing:
  backend: jaeger
  jaeger:
    sampler:
      type: remote

clusters:
  - namespaces:
      - namespace: default
        type: unaggregated
        retention: 48h
    client:
      config:
        service:
          env: default_env
          zone: embedded
          service: m3db
          cacheDir: /var/lib/m3kv
          etcdClusters:
            - zone: embedded
              endpoints:
                - 127.0.0.1:2379
metrics:
  scope:
    prefix: "query"
  prometheus:
    handlerPath: /metrics
    listenAddress: 0.0.0.0:7204 # until https://github.com/m3db/m3/issues/682 is resolved
  sanitization: prometheus
  samplingRate: 1.0
  extended: none

m3dbnode.yml

coordinator:
  tracing:
    backend: jaeger
    jaeger:
      sampler:
        type: const
        param: 1
db:
  tracing:
    backend: jaeger
    jaeger:
      sampler:
        type: remote

How are you using the service? For example, are you performing read/writes to the service via Prometheus, or are you using a custom script?

Custom script to read and write.

write_sample_data.sh

curl -X POST http://localhost:7201/api/v1/json/write -d '{
  "tags":
    {
      "__name__": "third_avenue",
      "city": "boston",
      "checkout": "1"
    },
    "timestamp": '\"$(date "+%s")\"',
    "value": 3347.26
}'

# Insert tagged data
curl http://localhost:9003/writetagged -s -X POST -d '{
  "namespace": "default",
  "id": "foo",
  "tags": [
    {
      "name": "__name__",
      "value": "user_login"
    },
    {
      "name": "city",
      "value": "new_york"
    },
    {
      "name": "endpoint",
      "value": "/request"
    }
  ],
  "datapoint": {
    "timestamp":'"$(date +"%s")"',
    "value": 42.123456789
  }
}'

query_sample_data.sh

curl -X "POST" -G "http://localhost:7202/api/v1/query_range" \
  -d "query=third_avenue" \
  -d "start=$( date -v -45S +%s )" \
  -d "end=$( date +%s )" \
  -d "step=5s" | jq .

Is there a reliable way to reproduce the behavior? If so, please provide detailed instructions.
1. Start Jaeger all-in-one. I personally ran each component from source as I wanted to filter out Node::health traces.
2. Run m3dbnode:
```
make m3dbnode

sudo ./bin/m3dbnode -f m3dbnode.yml
```
3. Run
```
make m3query

sudo ./bin/m3query -f md3query.yml
```
4. Write some data:
```
./write_sample_data.sh
```
5. Query the data
```
./query_sample_data.sh
```
6. Search for m3query traces in Jaeger: http://localhost:16686. Results in two traces for the single query:
7. Drilling into each trace, we see that one trace is a continuation of the other as in the screenshots at the top of this Issue.

Gibbs Cullen · Answer 1 · Sat Jan 30 2021 01:01:26 GMT+0800 (China Standard Time)

@arnikola -- any thoughts?

Bad Robot · Answer 2 · Mon Feb 08 2021 17:54:40 GMT+0800 (China Standard Time)

@gibbscullen this is a real pain point with an easy fix
How can we make progress with it?

Gibbs Cullen · Answer 3 · Tue Feb 09 2021 00:13:43 GMT+0800 (China Standard Time)

@nir-logzio -- we plan to look into this, however, feel free to make a contribution or suggestion in the meantime. We will be happy to review.

Albert · Answer 4 · Tue Feb 09 2021 11:17:13 GMT+0800 (China Standard Time)

I actually had a quick go at it; the fix is conceptually simple: copy the original context over to each successive function call.

However, the line of code I highlighted in the description is fairly high up in the call stack and there are a number of calls below it that also don't pass context. This led to a fan-out of changes to other functions needing context to be passed in (and so on), turning into a bit of a mess. I decided to discontinue my efforts at this point.

It's quite likely I was going about it the wrong way. We'd be happy to contribute, but some guidance would be appreciated; especially on whether there is a better approach than my attempt described above.

Gibbs Cullen · Answer 5 · Wed Feb 10 2021 03:57:18 GMT+0800 (China Standard Time)

@albertteoh - thanks for the update! Would be great if you were able to contribute - we'd be happy to review / provide guidance.

Albert · Answer 6 · Wed Feb 10 2021 04:42:06 GMT+0800 (China Standard Time)

Thanks @gibbscullen, would anyone be able to provide guidance based on my approach above? i.e. was it the right way to go about fixing the problem or is there a better approach?

Rob Skillington · Answer 7 · Thu Feb 11 2021 08:55:13 GMT+0800 (China Standard Time)

@albertteoh this PR propagates the context correctly, although I'm not sure if it will pass along the trace ID. Technically opentracing should do the right thing but have not tested it with the PR: #3125

Albert · Answer 8 · Thu Feb 11 2021 11:50:25 GMT+0800 (China Standard Time)

Thanks @robskillington, I believe opentracing should do the right thing. Looking forward to the PR being merged and released for us to try out!

Gibbs Cullen · Answer 9 · Fri Aug 20 2021 00:12:32 GMT+0800 (China Standard Time)

Closing since #3125 has been merged