Missing logs and APM traces when a lambda timeout happens
pvicente opened this issue · comments
During some tests I had timeout in lambda, was running more than 15 minutes and the final logs were missed in datadogs and traces around the whole execution.
I reduced the timeout to 850 seconds and enabled the DD_TRACE_DEBUG = true
but I couldn't see anything in logs.
See screenshots of latest logs of lambda execution in datadog and cloudwatch (Highlighted the ones missing in cloudwatch END, REPORT and task timeout):
Libraries in use:
datadog-lambda 4.63.0 The Datadog AWS Lambda Library
├── datadog >=0.41,<0.42
│ ├── decorator >=3.3.2
│ └── requests >=2.6.0
│ ├── certifi >=2017.4.17
│ ├── charset-normalizer >=2,<3
│ ├── idna >=2.5,<4
│ └── urllib3 >=1.21.1,<1.27
├── ddtrace >=1.4.1,<2.0.0
│ ├── attrs >=19.2.0
│ ├── bytecode *
│ ├── cattrs *
│ │ ├── attrs >=20 (circular dependency aborted here)
│ │ └── exceptiongroup *
│ ├── ddsketch >=2.0.1
│ │ ├── protobuf >=3.0.0
│ │ └── six *
│ ├── envier *
│ ├── jsonschema *
│ │ ├── attrs >=17.4.0 (circular dependency aborted here)
│ │ └── pyrsistent >=0.14.0,<0.17.0 || >0.17.0,<0.17.1 || >0.17.1,<0.17.2 || >0.17.2
│ ├── packaging >=17.1
│ │ └── pyparsing >=2.0.2,<3.0.5 || >3.0.5
│ ├── protobuf >=3 (circular dependency aborted here)
│ ├── six >=1.12.0 (circular dependency aborted here)
│ ├── tenacity >=5
│ ├── typing-extensions *
│ └── xmltodict >=0.12
└── wrapt >=1.11.2,<2.0.0
Hi - unfortunately this is a tricky situation as the Datadog Lambda Extension is killed alongside a Lambda Function when a function times out.
For nodejs, this issue is fixed today with layer 86 and this feature.
We're working on the same for Python, and will update this issue when completed.
Thanks!
this PR will support sending APM spans before a timeout for Python.
cc: @DylanLovesCoffee for the missing logs.
Any update on this?