inadarei / rfc-healthcheck

Health Check Response RFC Draft for HTTP APIs

Home Page:https://inadarei.github.io/rfc-healthcheck/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

resource for clients that ignore the response body

danielbcorreia opened this issue · comments

One thing that seems to be missing is a reference to monitoring systems that only rely on the response status code to make decisions. One of those clients are the common load balancers, that completely ignore the response body and only look at the response status code in order to decide whether or not that node should remain in the pool.

To avoid making the current resource diverge from the correct usage of status codes (see also #4), one option is to have a specific resource to handle this behaviour and that returns a 200 OK in case the service is healthy, or a 5xx when it is failing.

In my view this is very important as one of the main reasons people create healthchecks in the first place is to have integration with these systems. Even though the systems are clearly limited in their ability to read HTTP responses correctly, they should be supported.

Yeah, that was my initial motivation for suggesting that health check endpoint should also heavily utilize HTTP response codes to suggest the status of the service. That way monitoring systems ignoring the response body, would go by the response codes.

However, as pointed-out by @pmhsfelix and @dret here: #4 that has pretty bad implications. An HTTP endpoint should not use response codes to indicate the health of another endpoint, those are for the health of the responding endpoint itself.

So now I don't know how exactly to achieve what you are talking about...

A separate issue with using response codes like this is that the healthcheck response may describe many (sub)systems, so that it's not clear what a response of 500 means; the use of 2xx for pass and 5xx for fail makes more sense in the context of a single (sub)system.

I suggest unification of the base fields and the detail fields, such that a base healthcheck document can be used as a details array item directly. This gives three benefits:

  • it simplifies the spec, since the details item is no longer a separate kind of document
  • it allows 200/500 as a pass/fail for a given component for simpler or legacy systems as @danielbcorreia notes
  • it opens the possibility for systems to specify more or less granular health checks if those are available, via self and/or up links

@randallsquared not sure how unification of base fields and detail object fields (which is a conversation of its own if you want to open an issue about that) addresses @danielbcorreia's concern about clients that ignore the body?

not sure how unification of base fields and detail object fields [...] addresses @danielbcorreia's concern about clients that ignore the body?

It allows connecting a specific component with a status code. The "link to an endpoint" would then be the "self" link in the detail object (were the detail object a miniature healthcheck object).

This discussion mirrors the discussion in #4 Let's move there, to consolidate.