resource for clients that ignore the response body

Question

resource for clients that ignore the response body

danielbcorreia opened this issue 7 years ago · comments

One thing that seems to be missing is a reference to monitoring systems that only rely on the response status code to make decisions. One of those clients are the common load balancers, that completely ignore the response body and only look at the response status code in order to decide whether or not that node should remain in the pool.

To avoid making the current resource diverge from the correct usage of status codes (see also #4), one option is to have a specific resource to handle this behaviour and that returns a 200 OK in case the service is healthy, or a 5xx when it is failing.

In my view this is very important as one of the main reasons people create healthchecks in the first place is to have integration with these systems. Even though the systems are clearly limited in their ability to read HTTP responses correctly, they should be supported.

Irakli Nadareishvili · Answer 1 · Wed Jan 17 2018 19:38:11 GMT+0800 (China Standard Time)

Yeah, that was my initial motivation for suggesting that health check endpoint should also heavily utilize HTTP response codes to suggest the status of the service. That way monitoring systems ignoring the response body, would go by the response codes.

However, as pointed-out by @pmhsfelix and @dret here: #4 that has pretty bad implications. An HTTP endpoint should not use response codes to indicate the health of another endpoint, those are for the health of the responding endpoint itself.

So now I don't know how exactly to achieve what you are talking about...

Randall Randall · Answer 2 · Wed Jan 17 2018 20:05:46 GMT+0800 (China Standard Time)

A separate issue with using response codes like this is that the healthcheck response may describe many (sub)systems, so that it's not clear what a response of 500 means; the use of 2xx for pass and 5xx for fail makes more sense in the context of a single (sub)system.

I suggest unification of the base fields and the detail fields, such that a base healthcheck document can be used as a details array item directly. This gives three benefits:

it simplifies the spec, since the details item is no longer a separate kind of document
it allows 200/500 as a pass/fail for a given component for simpler or legacy systems as @danielbcorreia notes
it opens the possibility for systems to specify more or less granular health checks if those are available, via self and/or up links

Irakli Nadareishvili · Answer 3 · Wed Jan 17 2018 23:59:07 GMT+0800 (China Standard Time)

@randallsquared not sure how unification of base fields and detail object fields (which is a conversation of its own if you want to open an issue about that) addresses @danielbcorreia's concern about clients that ignore the body?

Erik Wilde · Answer 4 · Thu Jan 18 2018 09:51:31 GMT+0800 (China Standard Time)

On 2018-01-17 03:28, Daniel Correia wrote: To avoid making the current endpoint diverge from the correct usage of status codes (see also #4 <#4>), one option is to have a specific endpoint to handle this behaviour and that returns a 200 OK in case the service is healthy, or a 5xx when it is failing.

that sounds like a viable solution. what about including a link to such an endpoint as "watchdog" or something along these lines, if this really is a resource that is required?

Randall Randall · Answer 5 · Thu Jan 18 2018 11:16:24 GMT+0800 (China Standard Time)

not sure how unification of base fields and detail object fields [...] addresses @danielbcorreia's concern about clients that ignore the body?

It allows connecting a specific component with a status code. The "link to an endpoint" would then be the "self" link in the detail object (were the detail object a miniature healthcheck object).

Irakli Nadareishvili · Answer 6 · Mon Jan 29 2018 12:56:40 GMT+0800 (China Standard Time)

This discussion mirrors the discussion in #4 Let's move there, to consolidate.