xperimental / netatmo-exporter

Prometheus exporter for Netatmo sensor data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data is missing from time to time

AlfaJackal opened this issue · comments

As you can see in the image, there is no data from time to time. Are you experiencing the same?

image

If yes: Do you know how to fix it? For normal graph visualization I can fill those gap. Unfortunately I don’t know how to close the gap on single stat panel.

Hi @AlfaJackal , thanks for the message.

I have noticed this issue coming up more often now as well, but I have not looked into it yet. I'm planning to do some updates to the exporter soon and will probably also take a look at this issue while I am at that as well...

I've added some more logging to the new version, don't know why I omitted the error log previously. You can run the new version as well if you want to have a look at the error.

Unfortunately (?) I did not have an error yet with the logging in place, so I don't have any information yet on why it fails.

I am receiving these error messages now with the newest version! Sorry for posting a screenshot! Where do I find the log in the docker container?

image

Sorry, apparently I forgot to wire up the last change correctly and did not test with an up-to-date build. This new issue should be fixed now.

The log is not written to any file, so you can not download the logs as a file from the container anywhere.

The screenshot seems to be from the Docker front-end of a Synology. There's an "export" button to the top-left of the log.

First time I recognize that button. 😜
It is up and running again.

Running it since 21st of June and until now I have had five errors with
ERRO Error getting data: Bad HTTP return code 500

Anything else I can provide you with?

Maybe this is related to Netatmo servers? It seems that they are down very often, but I cannot validate the times.

I see the same error message. It's seems to be followed by the NetAtmo API returning old data (this triggers the "stale data" logs) until it fixes itself again.
I'll release another version which will also show the error message returned by NetAtmo soon, but I'm pretty sure the issue is on their end as we seem to be getting wrong data. I have also had an issue yesterday where the API did not return any data for my stations for a few hours.

-- edit: I have already been working on changing the behaviour of the exporter a bit so that it does cache the data internally for a while to reduce the load on the NetAtmo API if the query interval of Prometheus is not set to an extended value. This will probably also reduce the impact of this error.

Unfortunately the responses from the Netatmo API only contained a JSON with the internal service error message encoded in it and no further information. As the errors seem to be more frequent around midnight (UTC+2) my guess is that something is producing additional load on the API during that time.

I've just merged the caching code into master, if you like you can also test this version. This will not "fix" the issue, as the cause is on the side of the Netatmo API itself, but it should make it less pronounced in the metrics, because the exporter will not try to fetch new data all the time and instead just use old data it already has (until the data is old enough to be considered "stale").

You can still track whether the updates work using the netatmo_up metric. The new netatmo_cache_updated_time should periodically increase showing when the data is actually updated.

Awesome, will give it a shot! First thing I created in Grafana was a netatmo_up graph. 😉 And a netatmo_cache_updated_time stat card.

There seems to be a large "drift" in the age of the sensor data provided by the Netatmo API during the night. netatmo_sensor_updated reads the timestamp of the data as provided from the API and as far as I understand this should ideally always be below 10min, as the data in the API is updated about every 10min (per their documentation).

I'm measuring >50min age of the sensor data during the night (GMT+2), though. I'm using this query to identify the drift between the time the cache was updated and the time of the sensor data (result in minutes):

avg(scalar(netatmo_cache_updated_time) - netatmo_sensor_updated) / 60

For me this produces a graph which increases every 12h with the peaks around 10:00 GMT and 23:00 GMT.

The exporter ignores old sensor data by discarding information where the age is larger than the "stale duration". I've previously set the default for this to 30min which seemed to work in the past years. I've increased the default to 60min now to account for the drift found in this week.

What bothers me is that this "age drift" should also be visible in the data itself (sensor values displaying old data), but I have not seen this yet, so my assumption is that there is some kind of caching bug in the Netatmo API itself. This would also be an explanation for the HTTP 500 results that are returned sometimes.

Can you tell me if you also have a similar age graph in your data and if the increase in the stale duration fixes the display issues? The stale duration is also a configuration option if the new default is still not enough.

It seems that I have a very similar graph. Please have a look at mine, which is based on your query:
image

For me the maximums of the graph are much higher, at least for the previous weeks. The last few days the times reported stay much more in the interval I would have expected (which is what I am seeing in your graph as well; everything below 15min).

I had another big spike in the time drift during 2020-07-05, starting at ~10:00GMT and ending at ~15:00.

Don't know if the exporter can do anything if the data it gets is bad. I wonder why the API reports such old updated times even though the data seems to be newer.

Do you have any other idea or is the current version working "good enough" for you?

I think this version is good enough! Monitored it the last days and it looks pretty good so far in comparison to my other Netatmo Exporter for InfluxDB. Thank you!

One last question, a bit offtopic: How do you calculate netatmo_sensor_rain_amount_mm for 24h? It seems that the amount of mm is not comparable to those in netatmo app.

Sorry, I forgot about the question regarding "rain". The collector does not do any calculation on the value returned from the Netatmo API (see here). Unfortunately for me, the Netatmo documentation is also not very detailed on the subject of what number is stored in that property (see here). I don't have a rain (or wind) sensor for myself and so can unfortunately not check for myself.

If you can provide me with some examples I can maybe come up with some improvements, maybe it's just simply the wrong value to use. I'd say this would be a discussion for a new issue though.