gogama / incite

Hassle-free queries on Amazon CloudWatch Logs Insights in Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Retry API calls when the CWL API response payload can't be deserialized

vcschapp opened this issue · comments

User Story

As an developer building analytics using Incite, I want confidence my CWL Insights queries will not be aborted due to a transient network failure.

In particular, I don't want my queries to fail due to an an error like the one below if retrying the CloudWatch Logs API request that produced the error would have produced success:

ERROR [incite: query ID "7e623cab-90dc-4417-97ac-d5e728c57ae8" had unexpected error [query text "<some query>"]: SerializationError: failed to unmarshal response error
        status code: 503, request id: 744CFBE1FEEAB934
caused by: UnmarshalError: error message missing]

Details

Having seen this error several times, it is my belief that the above UnmarshalError represents some kind of transient HTTP problem that would succeed on retry. In this instance it seems the CloudWatch Logs service wanted to return HTTP 503 service unavailable but for some reason:

  • either the response payload containing the error message JSON got truncated, leading to a failure to desserialize; or
  • the HTTP 503 error emanated from a component that erroneously doesn't produce a proper response body.

In fact looking at it more closely, the messages emanates from unmarshal_error.go here and that in turn comes from unmarshal.go noting an io.EOF and consequently returning back the UnmarshalError with message "error message missing", see here.

So the root cause was the remote host/load balancer closing the connection and/or sending back and empty response body.

Fix

This can be fixed by slightly enhancing isTransient.

This is fixed as of commit 75d9c6b and will be released as part of v1.3.0, sometime in the next 1-3 weeks.