digitalocean / ceph_exporter

Prometheus exporter that scrapes meta information about a ceph cluster.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions about configuration and the IO ops metrics

arjunnath opened this issue · comments

Hi,

  1. After compiling and installing ceph_exporter on one node in our (32 node) cluster, we noticed its not showing IO Ops metrics as expected :
  • ceph_client_io_ops - Seems to be ok and shows a number
  • ceph_client_io_read_bytes - Always showing zero
  • ceph_client_io_read_ops - Always showing zero

What could be wrong here ?

  1. We've installed ceph_exporter on one client node only. Is this exporter meant to be installed on more than one node ? It was not clear from the documentation, would appreciate some clarity here.

Thanks

Running from a single node should be sufficient. Can you provide the version of Ceph that you are running Ceph exporter with?

Our ceph version is: 0.94.10
Running on CentOS release 6.8
Ceph exporter was recently compiled from github (about 3 days ago)

Thanks for that info! I do not have an access to any hammer cluster at this time, can you grab a snapshot of your ceph -s that shows client I/O?

Did you mean "ceph -s" or "ceph -w" ? I posted both below (with benchmark running):
Could this issue be because the ceph command output is different for the older versions of ceph ?


[root@hostxxx061 ~]# ceph -s
    cluster 7c266209-2890-43cc-980b-af5987b28810
     health HEALTH_OK
     monmap e1: 3 mons at {hostxxx061=10.21.231.161:6789/0,hostxxx062=10.21.231.162:6789/0,hostxxx063=10.21.231.163:6789/0}
            election epoch 22, quorum 0,1,2 hostxxx061,hostxxx062,hostxxx063
     osdmap e4857: 32 osds: 32 up, 32 in
      pgmap v167020: 2148 pgs, 4 pools, 514 GB data, 123 kobjects
            1034 GB used, 8243 GB / 9278 GB avail
                2148 active+clean
  client io 106 MB/s wr, 26 op/s

[root@hostxxx061 ~]# ceph -w                                                                               
    cluster 7c266209-2890-43cc-980b-af5987b28810
     health HEALTH_OK
     monmap e1: 3 mons at {hostxxx061=10.21.231.161:6789/0,hostxxx062=10.21.231.162:6789/0,hostxxx063=10.21.231.163:6789/0}
            election epoch 22, quorum 0,1,2 hostxxx061,hostxxx062,hostxxx063
     osdmap e4857: 32 osds: 32 up, 32 in
      pgmap v167014: 2148 pgs, 4 pools, 512 GB data, 123 kobjects
            1031 GB used, 8246 GB / 9278 GB avail
                2148 active+clean
  client io 111 MB/s wr, 27 op/s

2017-07-16 00:34:51.635735 mon.0 [INF] pgmap v167014: 2148 pgs: 2148 active+clean; 512 GB data, 1031 GB used, 8246 GB / 9278 GB avail; 111 MB/s wr, 27 op/s
2017-07-16 00:34:54.286179 osd.25 [INF] 11.2f9 scrub starts
2017-07-16 00:34:54.352769 osd.25 [INF] 11.2f9 scrub ok
2017-07-16 00:34:55.835085 mon.0 [INF] pgmap v167015: 2148 pgs: 2148 active+clean; 512 GB data, 1031 GB used, 8246 GB / 9278 GB avail; 115 MB/s wr, 28 op/s
2017-07-16 00:34:56.850851 mon.0 [INF] pgmap v167016: 2148 pgs: 2148 active+clean; 513 GB data, 1032 GB used, 8245 GB / 9278 GB avail; 102 MB/s wr, 25 op/s

Can you paste an instance of read bytes per second? As I believe that is the one that shows as 0. Output of ceph -s should be sufficient.

Before Jewel, Ceph did not have a way to distinguish between read and write ops so you will need to rely on ceph_client_io_ops in Hammer until you upgrade.

Can you paste an instance of read bytes per second? As I believe that is the one that shows as 0. Output of ceph -s should be sufficient.

I don't understand - I did paste ceph -s output in my previous post. Did I miss something ?
Should I post the ceph_exporter metrics for IO ?

Before Jewel, Ceph did not have a way to distinguish between read and write ops so you will need to rely on ceph_client_io_ops in Hammer until you upgrade.

I think this answers our question.
It seems ceph_exporter does not grab all metrics from older versions of ceph. It would be good to have a note about this on the ceph_exporter instructions.

Also, I suspect there may be other metrics that aren't available from older ceph (i.e 0.94.x). I'll dig into that shortly.

The output of ceph -s you pasted did not contain the read bytes per sec only writes. I was curious to see it to identify why we are not surfacing it for Hammer since it should technically be supported.

I have tried mentioning the supported releases in the Dependencies section, but point taken in terms of being more clear. Especially because as we keep extending support for newer releases, the existing stats might not be backward compatible and we'd need to evaluate how feasible it is to keep supporting older versions.

The output of ceph -s you pasted did not contain the read bytes per sec only writes. I was curious to see it to identify why we are not surfacing it for Hammer since it should technically be supported.

I checked again and this is what the last line of the ceph -s output shows :-

Write benchmark :
client io 116 MB/s wr, 29 op/s

Read benchmark :
client io 122 MB/s rd, 30 op/s

As you mentioned - there may be some backward-compatibility issues with ceph_exporter and ceph 0.94.xx I'll post if I find more in the next few days.

Thanks for your help. We appreciate the quick response.

Should we close this issue ?