digitalocean / ceph_exporter

Prometheus exporter that scrapes meta information about a ceph cluster.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

osdmap health details are not not found when monitoring ceph octopus with 3.0.0-nautilus

afreiberger opened this issue · comments

Nautilus version sets osdmap health data to zero (ceph_osds_up, for instance) when run against ceph octopus.

Octopus has done away with the second layer of OSDMap in the json struct returned by ceph status --format json

OSDMap struct {

This OSDMap struct will need to be de-dented and unwrapped from the second layer of OSDmap.

OSDMap struct {
NumOSDs float64 `json:"num_osds"`
NumUpOSDs float64 `json:"num_up_osds"`
NumInOSDs float64 `json:"num_in_osds"`
NumRemappedPGs float64 `json:"num_remapped_pgs"`
} `json:"osdmap"`

Here is an example ceph status --format json output from octopus:

{"fsid":"39254ea8-149f-11eb-b705-fa163e016da5","health":{"status":"HEALTH_OK","checks":{},"mutes":[]},"election_epoch":3,"quorum":[0],"quorum_names":["juju-13f892-test-0"],"quorum_age":5926,"monmap":{"epoch":1,"min_mon_release_name":"octopus","num_mons":1},"osdmap":{"epoch":19,"num_osds":3,"num_up_osds":3,"osd_up_since":1603396089,"num_in_osds":3,"osd_in_since":1603396089,"num_remapped_pgs":0},"pgmap":{"pgs_by_state":[{"state_name":"active+clean","count":1}],"num_pgs":1,"num_pools":1,"num_objects":0,"data_bytes":0,"bytes_used":3229286400,"bytes_avail":28970385408,"bytes_total":32199671808},"fsmap":{"epoch":1,"by_rank":[],"up:standby":0},"mgrmap":{"available":true,"num_standbys":0,"modules":["iostat","restful"],"services":{}},"servicemap":{"epoch":1,"modified":"2020-10-22T19:46:12.631525+0000","services":{}},"progress_events":{}}

Thanks @afreiberger! We have a bit of work to do on our side before we're ready for Octopus support. Namely, we'd like to get rid of the branch-per-release and support multiple active Ceph releases on the main branch. We have this work planned; just need to find the time.

That sounds like EXTREMELY worthy work. I wish I were a bit more handy with golang to help out. We've been identifying how to handle support of the separate branches in the juju charms (https://jaas.ai/prometheus-ceph-exporter) which uses a snapcraft build of your project on the back-end to expose these metrics within our devops application models. Work to consolidate branches into one project for all revisions would significantly help my team with this effort.

I will note for other travellers to this issue that the Nautilus code does align better with the performance statistics model of Octopus than prior branches, so it is still overall generally quite well aligned with the data models for anything besides general mon cluster health.

commented

This is fixed in the 4.0-dev branch which supports Nautilus, Octopus, and Pacific. We should have a release candidate soon. :)

commented

4.0.0 is now released :)