DataDog / datadog-lambda-extension

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Missing logs and APM traces when a lambda timeout happens

pvicente opened this issue · comments

During some tests I had timeout in lambda, was running more than 15 minutes and the final logs were missed in datadogs and traces around the whole execution.

I reduced the timeout to 850 seconds and enabled the DD_TRACE_DEBUG = true but I couldn't see anything in logs.

See screenshots of latest logs of lambda execution in datadog and cloudwatch (Highlighted the ones missing in cloudwatch END, REPORT and task timeout):

Screenshot 2022-10-27 at 10 35 16

Screenshot 2022-10-27 at 10 18 09

Libraries in use:

datadog-lambda 4.63.0 The Datadog AWS Lambda Library
├── datadog >=0.41,<0.42
│   ├── decorator >=3.3.2 
│   └── requests >=2.6.0 
│       ├── certifi >=2017.4.17 
│       ├── charset-normalizer >=2,<3 
│       ├── idna >=2.5,<4 
│       └── urllib3 >=1.21.1,<1.27 
├── ddtrace >=1.4.1,<2.0.0
│   ├── attrs >=19.2.0 
│   ├── bytecode * 
│   ├── cattrs * 
│   │   ├── attrs >=20 (circular dependency aborted here)
│   │   └── exceptiongroup * 
│   ├── ddsketch >=2.0.1 
│   │   ├── protobuf >=3.0.0 
│   │   └── six * 
│   ├── envier * 
│   ├── jsonschema * 
│   │   ├── attrs >=17.4.0 (circular dependency aborted here)
│   │   └── pyrsistent >=0.14.0,<0.17.0 || >0.17.0,<0.17.1 || >0.17.1,<0.17.2 || >0.17.2 
│   ├── packaging >=17.1 
│   │   └── pyparsing >=2.0.2,<3.0.5 || >3.0.5 
│   ├── protobuf >=3 (circular dependency aborted here)
│   ├── six >=1.12.0 (circular dependency aborted here)
│   ├── tenacity >=5 
│   ├── typing-extensions * 
│   └── xmltodict >=0.12 
└── wrapt >=1.11.2,<2.0.0

Hi - unfortunately this is a tricky situation as the Datadog Lambda Extension is killed alongside a Lambda Function when a function times out.

For nodejs, this issue is fixed today with layer 86 and this feature.

We're working on the same for Python, and will update this issue when completed.

Thanks!

this PR will support sending APM spans before a timeout for Python.

this PR will support sending APM spans before a timeout for Python.

@astuyve So it's just a matter to update dd-trace-py to version 1.9.0 when it's released? nothing else? What about the missing logs?

cc: @DylanLovesCoffee for the missing logs.

Any update on this?

With version 44 I've got in APM traces the error "Impending Timeout: Datadog detected an Impending Timeout" when this is happening so I can monitor it with timeout metric and add a link to the search of this error in APM. See

Screenshot 2023-07-18 at 11 07 19 Screenshot 2023-07-18 at 11 08 04