openzipkin / zipkin4net

A .NET client library for Zipkin

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The timer in MutableSpanMap have not proper handling span data.

a14907 opened this issue · comments

I have catch this exception in my product env:

   Unhandled exception. System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
   at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
   at zipkin4net.Tracers.Zipkin.WriterExtensions.WriteList[U](StreamWriter writer, SerializeMethod`1 serializer, String fieldName, ICollection`1 elements, IPEndPoint endPoint, String serviceName)
   at zipkin4net.Tracers.Zipkin.JSONSpanSerializer.SerializeSpan(StreamWriter writer, Span span)
   at zipkin4net.Tracers.Zipkin.JSONSpanSerializer.SerializeTo(Stream stream, Span span)
   at OpenPlatform.AspNetCore.Infrastructure.CusJSONSpanSerializer.SerializeTo(Stream stream, Span span) in /src/PlaformServices/Libraries/OpenPlatform.AspNetCore/Infrastructure/ZipkinStartup.cs:line 88
   at zipkin4net.Tracers.Zipkin.ZipkinTracerReporter.Report(Span span)
   at zipkin4net.Internal.Recorder.MutableSpanMap.FlushOldSpans(DateTime utcNow)
   at zipkin4net.Internal.Recorder.MutableSpanMap.<.ctor>b__6_0(Object _)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
   at System.Threading.TimerQueueTimer.CallCallback(Boolean isThreadPool)
   at System.Threading.TimerQueueTimer.Fire(Boolean isThreadPool)
   at System.Threading.TimerQueue.FireNextTimers()

I review the source code . and i know this is the problem of the System.Threading.Timer. This timer won't wait the pre task to finish ,and will start next task. That is the problem.

To verification this, I output the span when exception raise. This is show below:

    "spanState": {
        "traceIdHigh": 0,
        "traceId": -7806682531417472614,
        "parentSpanId": 6104606790270792834,
        "spanId": -3199394417785887997,
        "samplingStatus": 1,
        "flags": 0,
        "sampled": true,
        "debug": false,
        "extra": []
    },
    "annotations": [
        {
            "value": "sr",
            "timestamp": "2021-12-13T09:44:18.4564427Z"
        },
        {
            "value": "flush.timeout",
            "timestamp": "2021-12-13T09:49:14.2332799Z"
        },
        {
            "value": "flush.timeout",
            "timestamp": "2021-12-13T09:49:14.2332922Z"
        }
    ],
.....

There are two "flush.timeout" annotations, their timestamp are very close.

Maybe you should consider when to trigger the timer task.Please fix it.