digitalocean / ceph_exporter

Prometheus exporter that scrapes meta information about a ceph cluster.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[ERROR] Unable to collect data from ceph osd df rados: Invalid argument

gopherunner opened this issue · comments

After building the ceph_exporter go project, I try to run the bin ceph_exporter file, and I'm getting the following err msgs:

root@cm03:/# /usr/local/go/bin/ceph_exporter
2017/05/18 14:33:48 Starting ceph exporter on ":9128"
2017/05/18 14:33:55 [ERROR] cannot extract total bytes: strconv.ParseFloat: parsing "": invalid syntax
2017/05/18 14:33:55 [ERROR] cannot extract used bytes: strconv.ParseFloat: parsing "": invalid syntax
2017/05/18 14:33:55 [ERROR] cannot extract available bytes: strconv.ParseFloat: parsing "": invalid syntax
2017/05/18 14:33:55 failed collecting cluster health metrics: strconv.ParseFloat: parsing "": invalid syntax
2017/05/18 14:33:55 [ERROR] Unable to collect data from ceph osd df rados: Invalid argument
2017/05/18 14:33:55 failed collecting osd metrics: rados: Invalid argument

Any ideas?

Im getting the same error while using:

/ceph_exporter -ceph.config "/etc/ceph/ceph.conf" -ceph.user "admin" -exporter.config "/etc/ceph/exporter.yml"

Same issue here

[root@mds01ceph ~]# systemctl -l status ceph-exporter
● ceph-exporter.service - Ceph Exporter Service
Loaded: loaded (/usr/lib/systemd/system/ceph-exporter.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2020-03-12 12:15:31 +08; 3min 0s ago
Process: 2184 ExecStartPre=/usr/bin/docker rm ceph_exporter (code=exited, status=0/SUCCESS)
Process: 1146 ExecStartPre=/usr/bin/docker kill ceph_exporter (code=exited, status=1/FAILURE)
Main PID: 2196 (docker)
Tasks: 14
Memory: 95.4M
CGroup: /system.slice/ceph-exporter.service
└─2196 /usr/bin/docker run --name ceph_exporter -v /etc/ceph:/etc/ceph --net=host -p=9128:9128 ceph_exporter

Mar 12 12:17:30 mds01ceph docker[2196]: 2020/03/12 04:17:30 [ERROR] cannot extract total objects: strconv.ParseFloat: parsing "": invalid syntax
Mar 12 12:17:30 mds01ceph docker[2196]: 2020/03/12 04:17:30 failed performing PG dump brief: json: cannot unmarshal object into Go value of type collectors.cephPGDumpBrief
Mar 12 12:17:45 mds01ceph docker[2196]: 2020/03/12 04:17:45 [ERROR] cannot extract total objects: strconv.ParseFloat: parsing "": invalid syntax
Mar 12 12:17:45 mds01ceph docker[2196]: 2020/03/12 04:17:45 failed performing PG dump brief: json: cannot unmarshal object into Go value of type collectors.cephPGDumpBrief

I am having the same problem. For now I simply added StandardError=null to the systemd service, to avoid flooding the system log. However, this will also hide any other error message.

2020/06/16 18:43:43 failed performing PG dump brief: json: cannot unmarshal object into Go value of type collectors.cephPGDumpBrief```

I believe this was fixed.If you're running a Nautilus cluster, make sure you're not using the Luminous exporter.

There were a few changes from Luminous to Nautilus:

  • total objects error: was due to changes in ceph df
  • unmarshal error on ceph pg dump pgs_brief was due to a json output change

FWIW my clusters are Infernalis and Nautilus. Working to sunset the former. I've deployed Nautilus admin nodes on the Infernalis clusters for uniformity; the Docker image I'm using is for sure from the Nautilus branch from ... late June or early July. That said, I've never really understood the branch strategy, it's susceptible to divergence which for sure we've seen. Would seem cleaner to have just one branch with internal logic to act differently based on the version installed. Upstream randomly changing things -- notably time skew -- marks this a real moving target.

I never had to look at infernalis but this is likely the issue, change in pg_dump brief -f json output.

100% agreed on the multi-branch, our next step for ceph_exporter is to merge everything in one main branch before we start working on Octopus compatibility but there's a bit of work required in order to make it happen.

commented

As Alex mentioned above, this should no longer be a problem with newer versions. We also have the 4.0-dev branch which supports Nautilus, Octopus, and Pacific. We should have a release candidate soon. :)