Exporter fails when some OSDs not online (luminous)
goebi opened this issue · comments
The ceph_exporter from the luminous-2.0.0. branch fails when some OSDs not online. It seems there are duplicates values. Ceph Release is 12.2.7. We using the official docker container with luminous-2.0.0 tag...
An error has occurred during metrics gathering:
4 error(s) occurred:
- collected metric ceph_osd_down label:<name:"cluster" value:"ceph" > label:<name:"osd" value:"osd.40" > label:<name:"status" value:"down" > gauge:<value:1 > was collected before with the same name and label values
- collected metric ceph_osd_down label:<name:"cluster" value:"ceph" > label:<name:"osd" value:"osd.35" > label:<name:"status" value:"down" > gauge:<value:1 > was collected before with the same name and label values
- collected metric ceph_osd_down label:<name:"cluster" value:"ceph" > label:<name:"osd" value:"osd.36" > label:<name:"status" value:"down" > gauge:<value:1 > was collected before with the same name and label values
- collected metric ceph_osd_down label:<name:"cluster" value:"ceph" > label:<name:"osd" value:"osd.41" > label:<name:"status" value:"down" > gauge:<value:1 > was collected before with the same name and label values
When all OSDs online, everything work as expected.
Any news here? This issue is a blocker because the luminous-2.0.0 tag is not usable at the moment because of this issue.
Hi, would it be possible to share the output of ceph osd tree down -f json-pretty
on your cluster? I have a hunch that OSDs in nodes
and stray
arrays might have overlapped.
We have more than 30 clusters. so only use one。。。。。。 It seems there are duplicates values.
- collected metric ceph_osd_down label:<name:"cluster" value:"xxxx-region-1" > label:<name:"osd" value:"osd.56" > label:<name:"status" value:"up" > gauge:<value:1 > was collected before with the same name and label values
- collected metric ceph_osd_down label:<name:"cluster" value:"xxxx-region-1" > label:<name:"osd" value:"osd.169" > label:<name:"status" value:"up" > gauge:<value:1 > was collected before with the same name and label values
- collected metric ceph_osd_down label:<name:"cluster" value:"xxxx-region-1" > label:<name:"osd" value:"osd.45" > label:<name:"status" value:"up" > gauge:<value:1 > was collected before with the same name and label values
- collected metric ceph_osd_down label:<name:"cluster" value:"xxxx-region-1" > label:<name:"osd" value:"osd.158" > label:<name:"status" value:"up" > gauge:<value:1 > was collected before with the same name and label values
- collected metric ceph_osd_down label:<name:"cluster" value:"xxxx-region-1" > label:<name:"osd" value:"osd.49" > label:<name:"status" value:"up" > gauge:<value:1 > was collected before with the same name and label values
Hi, would it be possible to share the output of
ceph osd tree down -f json-pretty
on your cluster? I have a hunch that OSDs innodes
andstray
arrays might have overlapped.
How do I resolve this problem?
cause use command "ceph osd crush link xxxx"
Hi, this shouldn't be an issue anymore with our latest nautilus based release. Please re-open if you experience this again.