digitalocean / ceph_exporter

Nautilus version sets osdmap health data to zero (ceph_osds_up, for instance) when run against ceph octopus.

Octopus has done away with the second layer of OSDMap in the json struct returned by ceph status --format json

Line 928 in 339d20f

OSDMap struct {

This OSDMap struct will need to be de-dented and unwrapped from the second layer of OSDmap.

Lines 929 to 934 in 339d20f

    
           OSDMap struct { 
        
           	NumOSDs        float64 `json:"num_osds"` 
        
           	NumUpOSDs      float64 `json:"num_up_osds"` 
        
           	NumInOSDs      float64 `json:"num_in_osds"` 
        
           	NumRemappedPGs float64 `json:"num_remapped_pgs"` 
        
           } `json:"osdmap"`

Here is an example ceph status --format json output from octopus:

{"fsid":"39254ea8-149f-11eb-b705-fa163e016da5","health":{"status":"HEALTH_OK","checks":{},"mutes":[]},"election_epoch":3,"quorum":[0],"quorum_names":["juju-13f892-test-0"],"quorum_age":5926,"monmap":{"epoch":1,"min_mon_release_name":"octopus","num_mons":1},"osdmap":{"epoch":19,"num_osds":3,"num_up_osds":3,"osd_up_since":1603396089,"num_in_osds":3,"osd_in_since":1603396089,"num_remapped_pgs":0},"pgmap":{"pgs_by_state":[{"state_name":"active+clean","count":1}],"num_pgs":1,"num_pools":1,"num_objects":0,"data_bytes":0,"bytes_used":3229286400,"bytes_avail":28970385408,"bytes_total":32199671808},"fsmap":{"epoch":1,"by_rank":[],"up:standby":0},"mgrmap":{"available":true,"num_standbys":0,"modules":["iostat","restful"],"services":{}},"servicemap":{"epoch":1,"modified":"2020-10-22T19:46:12.631525+0000","services":{}},"progress_events":{}}

Thanks @afreiberger! We have a bit of work to do on our side before we're ready for Octopus support. Namely, we'd like to get rid of the branch-per-release and support multiple active Ceph releases on the main branch. We have this work planned; just need to find the time.

That sounds like EXTREMELY worthy work. I wish I were a bit more handy with golang to help out. We've been identifying how to handle support of the separate branches in the juju charms (https://jaas.ai/prometheus-ceph-exporter) which uses a snapcraft build of your project on the back-end to expose these metrics within our devops application models. Work to consolidate branches into one project for all revisions would significantly help my team with this effort.

I will note for other travellers to this issue that the Nautilus code does align better with the performance statistics model of Octopus than prior branches, so it is still overall generally quite well aligned with the data models for anything besides general mon cluster health.

This is fixed in the 4.0-dev branch which supports Nautilus, Octopus, and Pacific. We should have a release candidate soon. :)

4.0.0 is now released :)

	OSDMap struct {
	NumOSDs float64 `json:"num_osds"`
	NumUpOSDs float64 `json:"num_up_osds"`
	NumInOSDs float64 `json:"num_in_osds"`
	NumRemappedPGs float64 `json:"num_remapped_pgs"`
	} `json:"osdmap"`

osdmap health details are not not found when monitoring ceph octopus with 3.0.0-nautilus