inadarei / rfc-healthcheck

Health Check Response RFC Draft for HTTP APIs

Home Page:https://inadarei.github.io/rfc-healthcheck/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

is details a really good name?

Kyslik opened this issue · comments

I really cringe when I read details in the response, I feel it should be named services or components. In the RFC itself there is description of details:

details: (optional) an object representing status of sub-components of the service in question

Emphasis mine

Overall I would think that details encapsulates human readable reason with computer readable reason (code) why such and such is not in pristine condition (as well as links to documentation for specific code I tried to compose minimal example thus its not included):

{
  "status": "fail",
  "services": {
    "cassandra": [
      {
        "type": "datastore",
        "status": "fail",
        "details": {
          "reason": "Connection error.",
          "code": 10061
         }
      }
    ],
  },
  "details": {
    "reason": "A critical service is not working.",
    "code": 123
  }
}

And of course each sub-component / service (not a detail) shall have its own details.


Also RFC is really pushing metrics everywhere, I think that health-check is simple boolean kind of endpoint that answers the only question:

Can I use the API right now?

  • If not where can I find more details.

I quickly googled around and found an example

image

source

Thank you, Martin. You make an interesting point about objection to details. In reality what the section informs about is health of downstream dependencies. A field name "downstream-dependencies" feels too long, however...

As for the spec pushing metric everywhere - I will have to disagree with it. Main reason:

Metrics are optional. You can absolutely use this spec to get the level of detail that stops as "can I use or not". However, there are cases when this simple "pass" or "fail" approach isn't sufficient. Such approach, while simple and often enough, is very reactive - you will only know things are bad once things start to "fail". There are plenty of cases when you may want to start noticing thing deteriorating before "fail" so appropriate measures can be taken. The spec merely allows such operation, without mandating it or making it necessary, so in that regard I think it hits a good middle ground.

Will any of this be included in new version of RFC version (regarding details)? Should I try to find a good replacement for details?


Regarding metrics; I am still convinced that health-check should return boolean only instead of array of metrics; example https://status.github.com/api/status.json https://docs.gitlab.com/ee/user/admin_area/monitoring/health_check.html; I guess it depends who is going to use the service. Maybe I am mistaking it for status, and sure health-check should also show some more info.

Perhaps include an example without the use of metrics (in the RFC)? (Maybe I just do not know how to properly read RFCs, and do not discard optional things.)


Thank you for the time!

I had to publish the RFC since it had expired, but happy to hear suggestion for the better naming of "details".

Regarding your other feedback: it is totally fine if you want to just return a boolean for a downstream dependency and not provide more details. This is how your example outputs would look like per RFC:

Original from github status

{"status":"good","last_updated":"2018-10-08T22:39:11Z"}

RFC version of Github status

{"status" : "pass"}

(and caching/freshness negotiation is achieved using standard HTTP headers)

Original from Gitlab status

{
   "queues_check" : {
      "status" : "ok"
   },
   "redis_check" : {
      "status" : "ok"
   },
   "shared_state_check" : {
      "status" : "ok"
   },
   "db_check" : {
      "status" : "ok"
   },
   "cache_check" : {
      "status" : "ok"
   }
}

RFC version of the equivalent for Gitlab status

{
  "status": "pass",
  "details": {
    "queues_check": {"status": "pass"},
    "redis_check": {"status": "pass"},
    "shared_state_check": {"status": "pass"},
    "db_check": {"status": "pass"},
    "cache_check": {"status": "pass"}
  }
}