digitalocean / ceph_exporter

Prometheus exporter that scrapes meta information about a ceph cluster.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Exporter fails when some OSDs not online (luminous)

goebi opened this issue · comments

commented

The ceph_exporter from the luminous-2.0.0. branch fails when some OSDs not online. It seems there are duplicates values. Ceph Release is 12.2.7. We using the official docker container with luminous-2.0.0 tag...

An error has occurred during metrics gathering:

4 error(s) occurred:

  • collected metric ceph_osd_down label:<name:"cluster" value:"ceph" > label:<name:"osd" value:"osd.40" > label:<name:"status" value:"down" > gauge:<value:1 > was collected before with the same name and label values
  • collected metric ceph_osd_down label:<name:"cluster" value:"ceph" > label:<name:"osd" value:"osd.35" > label:<name:"status" value:"down" > gauge:<value:1 > was collected before with the same name and label values
  • collected metric ceph_osd_down label:<name:"cluster" value:"ceph" > label:<name:"osd" value:"osd.36" > label:<name:"status" value:"down" > gauge:<value:1 > was collected before with the same name and label values
  • collected metric ceph_osd_down label:<name:"cluster" value:"ceph" > label:<name:"osd" value:"osd.41" > label:<name:"status" value:"down" > gauge:<value:1 > was collected before with the same name and label values

When all OSDs online, everything work as expected.

Any news here? This issue is a blocker because the luminous-2.0.0 tag is not usable at the moment because of this issue.

Hi, would it be possible to share the output of ceph osd tree down -f json-pretty on your cluster? I have a hunch that OSDs in nodes and stray arrays might have overlapped.

We have more than 30 clusters. so only use one。。。。。。 It seems there are duplicates values.

  • collected metric ceph_osd_down label:<name:"cluster" value:"xxxx-region-1" > label:<name:"osd" value:"osd.56" > label:<name:"status" value:"up" > gauge:<value:1 > was collected before with the same name and label values
  • collected metric ceph_osd_down label:<name:"cluster" value:"xxxx-region-1" > label:<name:"osd" value:"osd.169" > label:<name:"status" value:"up" > gauge:<value:1 > was collected before with the same name and label values
  • collected metric ceph_osd_down label:<name:"cluster" value:"xxxx-region-1" > label:<name:"osd" value:"osd.45" > label:<name:"status" value:"up" > gauge:<value:1 > was collected before with the same name and label values
  • collected metric ceph_osd_down label:<name:"cluster" value:"xxxx-region-1" > label:<name:"osd" value:"osd.158" > label:<name:"status" value:"up" > gauge:<value:1 > was collected before with the same name and label values
  • collected metric ceph_osd_down label:<name:"cluster" value:"xxxx-region-1" > label:<name:"osd" value:"osd.49" > label:<name:"status" value:"up" > gauge:<value:1 > was collected before with the same name and label values

Hi, would it be possible to share the output of ceph osd tree down -f json-pretty on your cluster? I have a hunch that OSDs in nodes and stray arrays might have overlapped.

How do I resolve this problem?

cause use command "ceph osd crush link xxxx"

commented

Hi, this shouldn't be an issue anymore with our latest nautilus based release. Please re-open if you experience this again.