technyon / nuki_hub

Use an ESP32 as a Hub between a NUKI Lock and your smarthome.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Help needed: Lock.open executed multiple times

Caros2017 opened this issue · comments

First of all, thanks for this excellent piece of work and keeping it up to date! :)

I don't think this is a bug, but I want to know what is happening.
I am using Nukihub for a while now, however I have never used it to open the door. Only to lock and unlock

Since a week I am using lock.open within HomeAssistant (lock is connected through Mosquitto MQTT, ssl encrypted) and I noticed that quite often the lock opens twice or even three times in a row (like 5-8 seconds in between). This is definitely not desirable, since my door physically opens once the lock opens.

I went through my settings and saw that I had these settings set, because I wanted to be sure the lock locks and unlocks at given events.
Number of retries if command failed: 2
Delay between retries (milliseconds): 100

This could explain the behaviour, but I would like to know how I can troubleshoot what is happening:

  • How can I debug if a command failed?
  • Is there a way to log every lock.open state? So far HomeAssistant is only logging lock.lock and lock.unlock

I can imagine that 100ms is to fast to even check if the command failed.

Otherwise I have a feature request:

  • Enable retries for lock.lock and lock.unlock seperate from lock.open

I am currently running on these versions:
Nuki lock: 3.7.7
Nukihub: 8.28

I use this config, never had any repetition issue operating the lock through Home Assistant.

Make sure you don't have any automation for the lock that is interfering. Regarding the debugging: the lock is operated through autodiscovery, that means through MQTT, so the only way is using HA debugging features and monitoring the MQTT broker topics (through MQTT Explorer).

image

You should enable MQTT logging to see more information on what's going wrong.

I use this config, never had any repetition issue operating the lock through Home Assistant.

Make sure you don't have any automation for the lock that is interfering. Regarding the debugging: the lock is operated through autodiscovery, that means through MQTT, so the only way is using HA debugging features and monitoring the MQTT broker topics (through MQTT Explorer).

Thanks for your reaction. I have set the same settings as you and then it is possible that the door opens three times in a row.

You should enable MQTT logging to see more information on what's going wrong.

Also thanks for the reply! For now I have disabled the number of retries, by putting it to zero and I have never faced a moment it didn't went through.

For me it seems like it's retrying even if it mostly succeeds.

It does not do that normally, so we would need the logs to see what's going wrong in your setup. (Alternatively you could log using the serial connection, but that is usually more cumbersome.)

I have set the same settings as you and then it is possible that the door opens three times in a row.

I don't understand: you applied my settings, and what happened? It retried 3 times?

I'm pretty confident that you have another app/sw/automation that is doing strange things. Make sure the Nuki Hub is the ONLY controller of the lock. You are not using the Nuki Bridge, are you?

It does not do that normally, so we would need the logs to see what's going wrong in your setup. (Alternatively you could log using the serial connection, but that is usually more cumbersome.)

I had to find time to reproduce. Meanwhile I have updated my lock to 8.31. Just enabled MQTT logging and was able to catch an event where my lock opened three times. One time the first action and then two retries.

I use this config, never had any repetition issue operating the lock through Home Assistant.

Make sure you don't have any automation for the lock that is interfering. Regarding the debugging: the lock is operated through autodiscovery, that means through MQTT, so the only way is using HA debugging features and monitoring the MQTT broker topics (through MQTT Explorer).

Thanks for the reply. To make troubleshooting clean I have set my settings to exactly the same as you: 3/300. Unfortunately I have had the same issues (as expected). I can guarantee you that there is no other automation interfering. I used only lock.lock and lock.open and disabled flows using lock.open for the test. I have added the time-out from the MQTT logging.

I have set the same settings as you and then it is possible that the door opens three times in a row.

I don't understand: you applied my settings, and what happened? It retried 3 times?

I'm pretty confident that you have another app/sw/automation that is doing strange things. Make sure the Nuki Hub is the ONLY controller of the lock. You are not using the Nuki Bridge, are you?

It's exactly what you say. I applied your settings and it retried multiple times. Meaning: My lock opens physically multiple times in a row, when given the command to open once.

I think we have to agree to disagree ;). I am pretty confident that the retry isn't working perfectly fine. I don't own an original nuki hub and my bluetooth of my phone is always off.

I have attached what is happening. It seems that nukihub thinks the command failed, then retries. Unfortunately in real life the command doesn't fail. This results for me in a door which is opening multiple times.

Nuki_unlatch_log
Nukihub_state
Nukihub_ack

It's not a matter of disagreeing, it's Computer Science, so we need to understand what's going on. :)

From what I know, if Nuki Hub reports this:
image

It means the lock replied with an error. But it's better if @technyon confirms this. I don't know if it's just a timeout expiring or an error returning from the lock via BT.

Once we know that, we can try to understand WHY this happens.

Yes it means an error occured, could be either a timeout or the lock reporting a problem. It tried twice to send the command failed, then the third time it worked.

Yes it means an error occured, could be either a timeout or the lock reporting a problem. It tried twice to send the command failed, then the third time it worked.

If it's a timeout, it could be a BT communication issue: how long is the timeout?

If it's an error, it means the lock explicitly returns the error, correct? Is it possible to see in the log if it's a timeout or an error? It's important to better debug the problem.

I bet on the BT communication issue / timeout. :)

@Caros2017: what's the BT RSSI value? I suspect BT connection with the lock is not very good.

It's a timeout, it says "Lock action result: timeout", that's 10 seconds.

Since this seems to be a problem in communication I'd start to check there. Is the ESP placed close enough to the lock?

Sorry...I missed that on the phone, I only saw the failed command message. My bad.

So indeed it's a communication issue. @Caros2017 check your BT...how far is the hub from the lock? Please provide the BT RSSI value...

Yes it means an error occured, could be either a timeout or the lock reporting a problem. It tried twice to send the command failed, then the third time it worked.

What I don't understand it says that the third time it worked, but actually it worked three times in a row. My door opened at first command, first retry and second retry.

@Caros2017: what's the BT RSSI value? I suspect BT connection with the lock is not very good.

The lock is in direct sight of the ESP. They are less then two meters apart.

Nukihub_rssi

What I don't understand it says that the third time it worked, but actually it worked three times in a row. My door opened at first command, first retry and second retry.

If the lock doesn't return a valid ACK to the command, the Hub retries...

So it seems the lock received the command, it executes it, but fails to return the ACK before timeout expires.

@technyon can you check the retry mechanism and make sure that timeout is not too strict, and also, before retry, can you check the state of the lock to make sure it's in the state we expect it? This way we would know if the state transitioned after timeout expired.

What I don't understand it says that the third time it worked, but actually it worked three times in a row. My door opened at first command, first retry and second retry.

If the lock doesn't return a valid ACK to the command, the Hub retries...

So it seems the lock received the command, it executes it, but fails to return the ACK before timeout expires.

@technyon can you check the retry mechanism and make sure that timeout is not too strict, and also, before retry, can you check the state of the lock to make sure it's in the state we expect it? This way we would know if the state transitioned.

But nukihub received the ack. In the above screenshots you can see that at 8:08:21 it sends unlatch and receives the ack at the same second.

At least that's what I think to interpret from MQTT :)

But nukihub received the ack

to which unlatch command? the first/second or third? ;)

The log should be more verbose, and specify retry # etc.

But nukihub received the ack

to which unlatch command? the first/second or third? ;)

The log should be more verbose, and specify retry # etc.

If I look at the timestamps, it has to be the first one. It seems to be the actual timestamp.

8:08:21 -> Unlatch -> ACK

If I look at the timestamps, it has to be the first one. It seems to be the actual timestamp.

8:08:21 -> Unlatch -> ACK

I'm not sure...I think the timeout (300ms it seems) maybe it's too strict in some cases. Timestamps only show the seconds, we need milliseconds timestamps to understand precisely what's going wrong. And also the retry # when it retries.

If I look at the timestamps, it has to be the first one. It seems to be the actual timestamp.
8:08:21 -> Unlatch -> ACK

I'm not sure...I think the timeout (300ms it seems) maybe it's too strict in some cases. Timestamps only show the seconds, we need milliseconds timestamps to understand precisely what's going wrong. And also the retry # when it retries.

I have faced the same problems (multiple retries) when I set the timeout at 5000ms (5 seconds).

You set it to 300ms now? What ESP device are you using?

You set it to 300ms now? What ESP device are you using?

For the example logging above I uses indeed 300ms delay, retries 3.

For now I have disabled retries, since it's almost unusable. Most of the times I open the door it opens again one or two times..

Can you set it to 3000 and retry please? If it retries, it means the timeout logic is not working, or the ack is not understood, or there's a bug. But it doesn't happen to us. :)

Can you set it to 3000 and retry please? If it retries, it means the timeout logic is not working, or the ack is not understood, or there's a bug. But it doesn't happen to us. :)

I can confirm that's the case. I already experienced that. Do you need logging with that? If so I need to find some time and am wondering what you need :)

I can confirm that's the case

I gave 3 options: which one are you referring to? Also, what device are you using for the hub?

For debugging the code etc., we need the dev: @technyon. He'll tell you what he needs.

That number is about how long to wait between the delays. The actual timeout is hardcoded and is 10 seconds. I've enabled logging for the NUKI library, but that will result in a crazy amount of information is quite hard to read. I didn't write the library, so it's hard to read for me too, let's see how far we get. Also, this is serial logs, not MQTT since the logging is outside of my code. You can use for example HTERM and set it to 115200 baud.

nuki_hub-8.32-dbg-1.zip

Thanks. I need to find time to be able to do the debugging. Unfortunately it's not going to be coming week(s).

Short term solution for me: I have set retries to zero and made a flow in node red with checks the required state and solves it if it doesn't match.

Result: I have never experienced any difference between the desired state and the actual state.

Does that mean you still get the timeout error in the log, yet the lock opens?

Does that mean you still get the timeout error in the log, yet the lock opens?

That's exactly the case.

That's strange. So it seems that indeed the lock received and executed the command, but somehow the ESP didn't receive the confirmation. It's hard to say why that happens, so far you're the only one who reported this. Which lock are you using, and is your lock firmware up to date?

That's strange. So it seems that indeed the lock received and executed the command, but somehow the ESP didn't receive the confirmation. It's hard to say why that happens, so far you're the only one who reported this. Which lock are you using, and is your lock firmware up to date?

Type: Nuki 3.0 Pro
Nuki lock version: 3.7.7
Nukihub version: 8.28

I have never noticed it, since I haven't used the open command earlier. Other commands that repeat multiple times are no problem.

For now I have solved this in my flows. I just check if the current state equals the requested state. So far this hasn't been off. So probably the retry isn't nescessary for me, since it gives me more problems than it solves :)

Workarounds are ok, but there is indeed a problem: your Nuki receives commands but we don't know why it doesn't send ACKs. But the strange thing is that the rest is working, I mean status updates etc.

It's really weird, and nobody else has ever reported an issue like that.

Did you try another ESP device?

Did you try another ESP device?

No I haven't. I bought multiple ESP's, but they where either ESP-S3 or ESP2866. Then I realized ESP-S3 are not supported :( So this is the only one I have for this project, which I had laying around (M5Stack Atom Lite)

I'm also thinking this: tasmota devs told me (talking about other projects) that for the ESP32, doing bridging of BT+Wifi and also MQTT is a very heavy task. When these tiny cpus are stressed, power stability becomes very important. How are you powering the Atom? I'm thinking maybe when it's executing some code and updating mqtt etc., it loses the BLE response beacons from the lock...but it's just pure speculation. :)

I use only Atoms here...I like M5Stack products.

@Caros2017 I'm sorry I can't fix this on ESP side, but I can't reproduce this. Since you found a workaround, I'm closing here for now.