digitalocean / ceph_exporter

Prometheus exporter that scrapes meta information about a ceph cluster.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Format of ceph health changes jewel -> luminous

jan--f opened this issue · comments

The PR for the new format was merged to master already. So this will soon appear in distro packages. This will cause the ceph_exporter to break.
Details about the new format can be found here and here.

Ideally the exporter could be adjusted to be able to deal with both formats.

I agree. It shouldn't be hard to keep ceph_exporter backwards compatible , i.e. both formats.

@jan--f have you tried out a luminous build with these changes yet? I'm seeing that besides the change in format for messages, all the mon stats have disappeared. Specifically, the section under "health -> health_services -> mons" is gone. Do you know where these stats went? I don't see them in the new mgr status

To follow up, Sage indicated that he didn't think the mon metrics were being used or needed by anyone, but he could add them back if they are actually helpful.

For a workaround (at least for the health stats) one can add mon_health_preluminous_compat=true to ceph.conf.

hi @jan--f, did the client/recovery/cache I/O output of ceph status --format plain also get removed or moved elsewhere in Luminous? The lines in this function use the plain status output to look for client io, recovery io, and cache io, but I no longer see them in the luminous output: https://github.com/digitalocean/ceph_exporter/blob/master/collectors/health.go#L741

Any idea how to collect those values still? Client I/O and recovery I/O (but not cache I/O) look to be available per pool with ceph osd pool stats --format=json, but I'm wondering if those stats are still available top level (aggregate). Thanks!

Actually I do see client IO in the ceph status output, but it has to be active to be printed. Same with recovery and cache IO.

Instead of updating the parsing of the plain text, which changed format in luminous, it's probably the best bet to just parse the JSON instead. It appears the format can be gleaned from the client, cache, and recovery functions called from here in Luminous: https://github.com/ceph/ceph/blob/138f08d5df311d9e4987819a792c01838dc36806/src/mon/PGMap.cc#L253

@jan--f

For a workaround (at least for the health stats) one can add mon_health_preluminous_compat=true to
ceph.conf.

This would probably break oA's health display.

@jbw976 Yeah the I/O parts are not working for me either. I agree about the plains vs. json parsing. No idea why this implementation was chosen.
Also the json format also changed, so parsing the json won't help upgrade pains.

FWIW I'm also working on a mgr plugin that exports prometheus metrics. Its not equivalent to the ceph_exporter but should roughly export the same metrics (differences in naming and labels though). See ceph/ceph#16990

commented

This should no longer be a problem. :) Feel free to re-open if you're having problems with client IO metrics