home-assistant-libs / pytradfri

IKEA Trådfri/Tradfri API. Control and observe your lights from Python. Examples available. On pypi. Sans-io.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Gateway update

hmax42 opened this issue · comments

Where are you using pytradfri (eg stand-alone, Home Assistant etc)

Standalone, python3, self coded mqtt adapter script

Version of pytradfri

newest, updated from github

Finished processing dependencies for pytradfri==7.0.6

Backend used (aiocoap, libcoap)

libcoap, built via script on 12.0.6.2021

Expected behaviour

work as before ikea updated the gateway

Actual behaviour

crashes like this, sometimes after getting the devices, sometimes after getting the groups
the time of the crash varies wildly
i run my script and it may crash pretty fast, sometimes it runs a few hours, sometimes 5 minutes

i got this error first on me sorting the goups list, then i added the "print(groups)" for debugging, which is now the failing point

File "./mqtt2tradfri_p3.py", line 293, in
print (groups)
File "/usr/local/lib/python3.7/dist-packages/pytradfri/group.py", line 181, in repr
state = "on" if self.state else "off"
File "/usr/local/lib/python3.7/dist-packages/pytradfri/group.py", line 43, in state
return self.raw.get(ATTR_DEVICE_STATE) == 1
AttributeError: 'NoneType' object has no attribute 'get'

Code snippet

       try:
           groups_command = gateway.get_groups()
           groups_commands = api(groups_command)
           groups = api(groups_commands)
       except Exception:
           print("error groups")
       if groups:
           print (groups)
           print("Id\tState\tDimmer\tName")
           print(" Memb.\tState\tDimmer\tName")
           try:
               groups = sorted(groups, key=lambda x: int(x.id))
           except Exception:
               print("Except while sorting")
           for g in range(len(groups)):

my app shows an gateway update on 30th june, 1.15.34
one of the changes are "Bug fixes - CoAP issues"
guess i found them

i know i cannot expect that fast code updates, but i did not see an issue for this, soo i created one.

I'm also having issues with latest gateway update. It still works but is super slow to react to commands being send

No issues on my gateway using 1.15.34 so the error from @thomasdelaet above might be unrelated to the update.

I got same observations. After updating to 1.15.34 actually the whole installation broke down and had to disable integrations to Tradfri in my home automation system. I get similar errors as OP got. Moreover after update whole COAP communications got hell slow:

coap-client -m put -u "[redacted]" -k "[redacted]" -e '{ "3311": [{ "5851": 127 }] }' "coaps://192.168.88.66:5684/15001/65575" -v 7

which should set brightness to ca 50% of bulb with ID 65575 gets something like 1 second with interesting log here:

192.168.88.50:56754 <-> 192.168.88.66:5684 DTLS: DTLS retransmit timeout

Also listing groups and devices using "example_sync.py" script takes ages (like 20 seconds or something) before CLI shows up. Everything was working flawlessly before June 30th :/

Good (or rather not) to hear that it's (probably) not my setup's fault :/

i am not experiencing super-slowness, but sometimes commands feel ignored.
i tried to lower a blind just now, because it opened after i reinserted the charged powercell, and i had to send the close command twice.
Ps: it crashed after got up today, but before my code was to open the blinds. right now it's running flawless again
pps: gateway seems to have crashed, i.e. not responsible for lib, app doesn't find it either. turned it of and on again. now lib works again

@hmax42 thanks for the update, closing this issue

@ggravlingen i meant, the lib and the app can connect again. but restarts are only a temporary solution. i restarted yesterday as well.

@hmax42 (and maybe @ggravlingen) can you check the timings on the COAP level? I mean using the coap-client app?

From June 30th I get something like (take a look at those DTLS retransmit timeout message, exactly after 1 second delay):

theta::bartek:~ $ coap-client -m put -u "[redacted]" -k "[redacted]" -e '{ "3311": [{ "5851": 3 }] }' "coaps://192.168.88.66:5684/15001/65575" -v 7
Jul 03 11:40:10.102 DEBG ***192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: new outgoing session
Jul 03 11:40:10.102 DEBG CoAP Client restricted to (D)TLS1.2 with Identity Hint callback
Jul 03 11:40:10.102 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: SSL_connect:before SSL initialization
Jul 03 11:40:10.102 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: sent 275 bytes
Jul 03 11:40:10.102 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: SSL_connect:SSLv3/TLS write client hello
Jul 03 11:40:10.102 DEBG sending CoAP request:
Jul 03 11:40:10.102 DEBG ** 192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: tid=5041: delayed
Jul 03 11:40:10.102 DEBG timeout is set to 90 seconds
Jul 03 11:40:10.103 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: received 60 bytes
Jul 03 11:40:10.103 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: SSL_connect:SSLv3/TLS write client hello
Jul 03 11:40:10.103 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: SSL_connect:DTLS1 read hello verify request
Jul 03 11:40:10.103 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: sent 307 bytes
Jul 03 11:40:10.103 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: SSL_connect:SSLv3/TLS write client hello
Jul 03 11:40:10.104 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: received 101 bytes
Jul 03 11:40:10.104 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: received 25 bytes
Jul 03 11:40:11.102 DEBG ** 192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: DTLS retransmit timeout
Jul 03 11:40:11.102 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: sent 307 bytes
Jul 03 11:40:11.103 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: received 101 bytes
Jul 03 11:40:11.103 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: SSL_connect:SSLv3/TLS write client hello
Jul 03 11:40:11.103 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: SSL_connect:SSLv3/TLS read server hello
Jul 03 11:40:11.103 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: SSL_connect:SSLv3/TLS read server done
Jul 03 11:40:11.103 DEBG got psk_identity_hint: ''
Jul 03 11:40:11.103 INFO Identity Hint '' provided
Jul 03 11:40:11.103 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: SSL_connect:SSLv3/TLS write client key exchange
Jul 03 11:40:11.103 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: SSL_connect:SSLv3/TLS write change cipher spec
Jul 03 11:40:11.104 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: sent 126 bytes
Jul 03 11:40:11.104 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: SSL_connect:SSLv3/TLS write finished
Jul 03 11:40:11.104 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: received 25 bytes
Jul 03 11:40:11.106 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: received 14 bytes
Jul 03 11:40:11.106 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: SSL_connect:SSLv3/TLS write finished
Jul 03 11:40:11.106 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: received 53 bytes
Jul 03 11:40:11.107 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: SSL_connect:SSLv3/TLS read change cipher spec
Jul 03 11:40:11.107 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: SSL_connect:SSLv3/TLS read finished
Jul 03 11:40:11.107 DEBG ***EVENT: 0x01de
Jul 03 11:40:11.107 DEBG ***192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: session connected
Jul 03 11:40:11.107 DEBG ** 192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: tid=5041: transmitted after delay
Jul 03 11:40:11.107 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: sent 73 bytes
v:1 t:CON c:PUT i:13b1 {} [ Uri-Path:15001, Uri-Path:65575 ] :: '{ "3311": [{ "5851": 3 }] }'
Jul 03 11:40:11.107 DEBG ** 192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: tid=5041: added to retransmit queue (2156ms)
Jul 03 11:40:11.108 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: received 33 bytes
v:1 t:ACK c:2.04 i:13b1 {} [ ]
Jul 03 11:40:11.108 DEBG ** 192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: tid=5041: removed
Jul 03 11:40:11.108 DEBG ** process incoming 2.04 response:
Jul 03 11:40:11.108 DEBG *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: sent 31 bytes
Jul 03 11:40:11.108 INFO *  192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: SSL3 alert write:warning:close notify
Jul 03 11:40:11.108 DEBG ***EVENT: 0x0000
Jul 03 11:40:11.108 DEBG ***192.168.88.50:53306 <-> 192.168.88.66:5684 DTLS: session closed

Which shows the 1 second delay. In case of listing all devices - it takes much more time to get those.
Just to add that restarting the gateway doesn't/didn't help.

Like this kind of code used to work almost instantly, now I have to wait 9 seconds (take a look at the time) to get details of 7 groups (!):

Python script (part of it):

    def listDevices(self):
        print("--> listDevices")
        gateway = Gateway()
        devicesArray = []
        print("GROUPS:")
        groups = self.api(self.api(gateway.get_groups()))
        for groupName in groups:
            print(groupName.path, "-->", groupName)

OUTPUT:

lip 04 01:38:30 --> listDevices
lip 04 01:38:30 GROUPS:
lip 04 01:38:39 ['15004', 131093] --> <Group Salon RGB - off>
lip 04 01:38:39 ['15004', 131084] --> <Group Biurko - off>
lip 04 01:38:39 ['15004', 131089] --> <Group Pokój Lili górne - off>
lip 04 01:38:39 ['15004', 131090] --> <Group Biurko Nevada - off>
lip 04 01:38:39 ['15004', 131079] --> <Group Korytarz góra - off>
lip 04 01:38:39 ['15004', 131091] --> <Group SuperGroup - off>
lip 04 01:38:39 ['15004', 131092] --> <Group Kinkiety - off>

It looks to me now that every 7-8 hours the gateway falls into some state where it may or may not recover by itself.
i predict my next error to somewhere between 13 and 14 o'clock.

When this happens, i get the above AttributeType-Error, but not always at the same script-position, but always during some kind of group processing.
After that, the error changes to RequestError, which does not abort my script, but still doesn't connect anymore. (=> no responses at all)

running your command with my creds and ip

pi@baseberrypi:~/scripts $ coap-client -m put -u "yyy" -k "xxx" -e '{ "3311": [{ "5851": 3 }] }' "coaps://192.168.7.180:5684/15001/65575" -v 7
Jul 04 05:39:25 DEBG created DTLS endpoint 0.0.0.0:56658
v:1 t:CON c:PUT i:c0fd {} [ ]
Jul 04 05:39:25 DEBG sending CoAP request:
v:1 t:CON c:PUT i:c0fd {} [ Uri-Path:15001, Uri-Path:65575 ] :: '{ "3311": [{ "5851": 3 }] }'
Jul 04 05:39:25 DEBG *** new session 0x12c148
Jul 04 05:39:25 DEBG call dtls_write
Jul 04 05:39:25 DEBG *** add 0x12c678 to sendqueue of session 0x12c148
Jul 04 05:39:25 DEBG timeout is set to 90 seconds
Jul 04 05:39:25 DEBG received 60 bytes on fd 3
Jul 04 05:39:25 DEBG received 95 bytes on fd 3
Jul 04 05:39:25 DEBG received 25 bytes on fd 3
Jul 04 05:39:25 DEBG received 15 bytes on fd 3
Jul 04 05:39:25 INFO ** Alert: level 2, description 20
Jul 04 05:39:25 ALRT 20 invalidate peer
Jul 04 05:39:25 DEBG *** removed transaction 2421
Jul 04 05:39:25 DEBG *** EVENT: 0x0200
Jul 04 05:39:25 DEBG *** removed session 0x12c148
Jul 04 05:39:25 WARN received alert, peer has been invalidated
then after 3minutes the call returns to bash

i then power cycle the gateway as it is a pretty fast solution,
my script now can connect again.

if i now run the command again, it is now instantly finished

pi@baseberrypi:~/scripts $ coap-client -m put -u "xxx" -k "yyy" -e '{ "3311": [{ "5851": 3 }] }' "coaps://192.168.7.180:5684/15001/65575" -v 7
Jul 04 06:04:29 DEBG created DTLS endpoint 0.0.0.0:57729
v:1 t:CON c:PUT i:1a2f {} [ ]
Jul 04 06:04:29 DEBG sending CoAP request:
v:1 t:CON c:PUT i:1a2f {} [ Uri-Path:15001, Uri-Path:65575 ] :: '{ "3311": [{ "5851": 3 }] }'
Jul 04 06:04:29 DEBG *** new session 0x556148
Jul 04 06:04:29 DEBG call dtls_write
Jul 04 06:04:29 DEBG *** add 0x556678 to sendqueue of session 0x556148
Jul 04 06:04:29 DEBG timeout is set to 90 seconds
Jul 04 06:04:29 DEBG received 60 bytes on fd 3
Jul 04 06:04:29 DEBG received 95 bytes on fd 3
Jul 04 06:04:29 DEBG received 25 bytes on fd 3
Jul 04 06:04:29 DEBG received 14 bytes on fd 3
Jul 04 06:04:29 DEBG received 53 bytes on fd 3
Jul 04 06:04:29 DEBG *** EVENT: 0x01de
Jul 04 06:04:29 DEBG received 52 bytes on fd 3
Jul 04 06:04:29 INFO ** application data:
Jul 04 06:04:29 DEBG set data to 0x5568e5 (pdu ends at 0x5568f7)
v:1 t:ACK c:4.05 i:1a2f {} [ ] :: 'Method Not Allowed'
Jul 04 06:04:29 DEBG *** removed transaction 38304
Jul 04 06:04:29 DEBG ** process incoming 4.05 response:
v:1 t:ACK c:4.05 i:1a2f {} [ ] :: 'Method Not Allowed'
4.05 Method Not Allowed
Jul 04 06:04:29 DEBG *** removed session 0x556148

i see @tomeczko has a Supergroup too. any information about that? not everythings i have is in there, so what's the criteria for being a member in that?

Update: about 3.5 hours later the RequestTimeout is back
of course fixed with another device restart
Update2: automating the restart with a smart plug now, errors appears more often than my inital assessment
Update3: i tried to introduce observing into my script, but it just calls the first callback once and then stops executing my script.

is there a forum where one can ask development questions?

Any difference if you test the change in #314 ?

@ggravlingen i updated the already installed aiocoap to 0.4.1
i guess this came with me installing tradfri via pip3 ?
so i wouldn't have needed to install libcoap via the script?

then i changed my import to use aiocoap

ValueError: Use APIFactory.init(…) to initialize APIFactory

if i changed my

api_factory = APIFactory(host=args.host, psk_id=identity, psk=psk)

to

api_factory = APIFactory.init(host=args.host, psk_id=identity, psk=psk)

i get

Traceback (most recent call last):
File "./mqtt2tradfri_v4.py", line 291, in
a = api_factory.request
AttributeError: 'coroutine' object has no attribute 'request'
sys:1: RuntimeWarning: coroutine 'APIFactory.init' was never awaited

why? do i need this?
i am went by the sync examples when i wrote my code.

I'm not an expert but try to do this (as looking at the code):

api_factory = await APIFactory.init(host=args.host, psk_id=identity, psk=psk)

(the await keyword).

Not sure if this is helpful but after some debugging, I observe the following issues/changed behaviors:

  • Put requests result in an empty response. But I assume this is normal behavior?
  • Get requests have 2 failure responses. Sometimes I get {"r": "06"}, sometimes I get an empty response.

BTW: If this is issue with firmware, is there any way to contact the Ikea development team? I understood that they have been quite helpful in the past.

@thomasdelaet The IKEA development team have previously acknowledged that we are working on this project but informed us that we are using their internal API and thus are on our own in terms of debugging etc.

I'm not 100% sure of this but this is quite likely an issue related to your local environment. I'm on the same version and don't experience any issues.

i will try to adjust my code on th weekend

@ggravlingen thanks for input.

I've got a bit further in debugging. This is an example output of when issues occur. I start getting empty these {"r": "06"} responses where I don't know the meaning off (any idea?), and empty responses. Then nothing anymore. (timeouts are triggered). I've coded to retry 3 times. When I connect via the Ikea app, it is unable to find the gateway. Any hints on how to further debug/fix this would be enormously appreciated!

`
DEBUG:pytradfri.api.libcoap_api:Executing 192.168.3.71 get ['15001', 65642]
DEBUG:pytradfri.api.libcoap_api:Received: {"r":"06"}
error in retrieving state

DEBUG:pytradfri.api.libcoap_api:Executing 192.168.3.71 get ['15001', 65643]
DEBUG:pytradfri.api.libcoap_api:Received: {"r":"06"}
error in retrieving state

DEBUG:pytradfri.api.libcoap_api:Executing 192.168.3.71 get ['15001', 65630]
DEBUG:pytradfri.api.libcoap_api:Received: {"r":"06"}
error in retrieving state

DEBUG:pytradfri.api.libcoap_api:Executing 192.168.3.71 get ['15001', 65657]
DEBUG:pytradfri.api.libcoap_api:Received:
error in retrieving state

DEBUG:pytradfri.api.libcoap_api:Executing 192.168.3.71 get ['15001', 65656]
DEBUG:pytradfri.api.libcoap_api:Received:
error in retrieving state

DEBUG:pytradfri.api.libcoap_api:Executing 192.168.3.71 get ['15001', 65631]
DEBUG:pytradfri.api.libcoap_api:Received:
error in retrieving state

DEBUG:pytradfri.api.libcoap_api:Executing 192.168.3.71 get ['15001', 65632]
DEBUG:pytradfri.api.libcoap_api:Received: {"r":"06"}
error in retrieving state

DEBUG:pytradfri.api.libcoap_api:Executing 192.168.3.71 get ['15001', 65721]
DEBUG:pytradfri.api.libcoap_api:Received:
error in retrieving state

DEBUG:pytradfri.api.libcoap_api:Executing 192.168.3.71 get ['15001', 65634]
DEBUG:pytradfri.api.libcoap_api:Received:
error in retrieving state

DEBUG:pytradfri.api.libcoap_api:Executing 192.168.3.71 get ['15001', 65659]
DEBUG:pytradfri.api.libcoap_api:Executing 192.168.3.71 get ['15001', 65659]
DEBUG:pytradfri.api.libcoap_api:Executing 192.168.3.71 get ['15001', 65659]
request timed out in get_state
`

Is the app unable to connect before or after you’ve started hitting the gateway with commands? I’ve tried in the past to reproduce the errors on the gateway and was able to if I hit (flooded) the gateway with a series of commands that it didn’t have time to process. Is something similar happening here, ie you’re constantly polling a lot of devices?

Hey,

First of all: thank you so much for looking into this! Very much appreciated.

The reason why have developed my own custom bridge (I use pytradfri and expose then over mqtt) is because the default integration with home assistant was very unreliable for my setup. I traced this down to the observe commands and since I only use my lights through home assistant the state of the lights only changes through home assistant so I have no need for observe functionality. I had the sense that gateway was overwhelmed with all observe commands. I have a setup with 75 light bulbs.

The current issue started after the june 30 firmware update on the gateway. It seems that the gateway just stops working after a while but then resumes operation. During this time, I get these timed out responses via pytradfri but I can also not connect via the app. In my script I have implemented a time.sleep(0.05) after every command hits the gateway to avoid flooding.

i am polling once per minute for 3 bulbs, 3 sockets and 2 blinds with 4 remotes.
i am waiting 2 seconds after getting devices and groups each to avoid flooding.

Are you executing the commands from a wired or WiFi device? If WiFi, try going wired and see if this helps. I’m still inclined to say this error is due to something in your local environment. Could you please describe your full setup that you’re running on here? Please also include the type of router.

i use a raspberry pi 3 with an wired ethernet connection going to a 16-port switch (gs1100), which directly also connects the tradfri gateway.
The only devices more than 4m in distance fron the tradfri hub are one bulb and one remote on the other room.
My router is a fritzbox 7390, but i don't see how that's relevant here

@hmax42 that doesn't sound like a troublesome setup - I'm also running wired on a RPI3. The reason I asked about the router is that it's one of the devices handling traffic between the gateway and the controller (the PI).

Can you please try running this code modified from your initial post? Just want to see if it's all groups failing or just one of them.

       try:
           groups_command = gateway.get_groups()
           groups_commands = api(groups_command)
           groups = api(groups_commands)
       except Exception:
           print("error groups")
       if groups:
           for group in groups:
               try:
                   print(group)
               except AttributeError as e:
                   print(e)

Setup that I'm running:

  • Tradfri gateway and computer running pytradfri script are on same network, both connected to switch. Switch is 24 port TP-Link without any special config (no VLAN's, firewall)
  • There are no firewalls running on local network
  • Computer running script is Intel NUC running Ubuntu. Python version 3.8.10
  • I'm using latest version of pytradfri in sync mode and used script to install libcoap
  • I've posted script that I'm using here pytradfri-mqtt.py

i have 6 groups, all show the none error.
i wait 2s after querying devices before querying the groups.

<class 'AttributeError'>
'NoneType' object has no attribute 'get'
<class 'AttributeError'>
'NoneType' object has no attribute 'get'
<class 'AttributeError'>
'NoneType' object has no attribute 'get'
<class 'AttributeError'>
'NoneType' object has no attribute 'get'
<class 'AttributeError'>
'NoneType' object has no attribute 'get'
<class 'AttributeError'>
'NoneType' object has no attribute 'get'
11/07/2021 17:52:58 error devices
<class 'pytradfri.error.RequestTimeout'>
11/07/2021 17:54:09 error devices
<class 'pytradfri.error.RequestTimeout'>

interesting though, it just came back to life

11/07/2021 18:18:40 error devices
<class 'pytradfri.error.RequestTimeout'>
11/07/2021 18:19:50 error devices
<class 'pytradfri.error.RequestTimeout'>
Id State Dimmer Name
Memb. State Dimmer Name
<Group Schlafzimmer - off>
131074 -- x Schlafzimmer
65540 False 64 TRADFRI bulb E27 CWS opal 600lm
65561 x x TRADFRI remote control 2

that's about 30mins of outage

So, in both your cases @hmax42 and @thomasdelaet it seems the gateway just stops responding. That, in turn, causes your scripts to stop working. I would thus argue that this library works as expected, but that there is something in the lower level communication between the client (you) and the gateway that's causing the issues.

To continue the debugging, have you setup any schedule or similar automation through the app? There might be a few installed by default? If so, any change if you remove them? Is someone else in the household using the app while you're using the script? Are the any other processes or scripts communicating with the gateway while you run the scripts?

sorry, for the lack of updates

  1. i have no automations in the ikea app. i have the 3 default timed automations, disabled. and the scenes feature still offers to "explore", i.e. never used before. all automation would happen via MQTT
  2. i do have 2 echo dots, but those are off most of the time and are not relevant. also never defined automations there.
  3. my system works pretty well now with automatic reboots of the gateway if the tradfri script gets more than 2 fails.
    that's one reason why i haven't tackled the rewrite with async/await yet, but i consider that only a workaround, so the rewrite will happen eventually.
    but sometimes there is an error and after that it works again without any restarts.
    today, restarts were at 7:10, 11:30 & 15:50. i will be waiting if one occurs at 20:10
    at 17:25 there was a single error, but no restart was needed, as no followup fails happened.
  4. maybe it irrelevant, maybe not: some devices become inactive over time
    • ikea buttons & remotes
      • work as intended on all devices of that group
    • ikea app
      • devices are shown as inactive
      • a group consisting only of these is also inactive, no interaction possible
      • a mixed group of inactive and active devices works only on the active devices
    • pytradfri
      • lists all devices correctly in their groups
      • the states of inactive groups or devices is not correctly returned, usually it says "off" for lights/sockets. (both partial and fully inactive groups)
      • sending a command to a group works, all members get turned on or off, including the inactive ones !
        • but the group state does not change, IIRC
      • sending a command to an fully inactive device has no effect
    • repairing the inactive state
      • python unrelated internet research vielded the answer: un/replug the device in question and it becomes active again the ikea app
      • pytradfri now returns the correct state for these devices (and groups) again
      • the strange thing: the 4 affected devices (2 bulbs, 2 sockets) are 2 of the closest and two of the farthest devices to the gateway in the living room. one additional bulb in the next room, was inactive once, is now still active.
    • so far, only bulbs and sockets became inactive ever

maybe the gateway times out querying the inactive devices, but this is just a guessing game.
i will see what happens tonight at 20:10, at the moment no devices are inactive.

@ggravlingen so IKEA said, we are using their internal API. is there any other API?

just now the error came while querying groups,
my above given times were not exact, they were:
7:08, 11:29, 15:48 and now 20:07
so it's about 260 minuten between gateway power on and the mysterious timeout

I can confirm that the library works as expected:

  • When light is offline, the ikea app also shows the light as offline
  • When connection is not possible, the ikea app can also not connect.

The issue is really the firmware update:

  • Lights go offline much more frequently. My work around is to use relays on light circuits that I had already installed: whenever home assistant detects that light is offline, it switches relay on/off.
  • The gateway also goes offline. I monitor this in home assistant with device_tracker and I installed a shelly plug on gateway and also use on/off trick to restart when this happens.

Does anybody have a contact of Ikea tradfri dev team so I could let them know?

Thomas

i will now try a new approach: deny internet access to the gateway

Is there a known way to downgrade firmware? This combined with then blocking internet access might help ;-)

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

Theoretical question:
Would it be possible/more viable to have some kind of home automation software with a zigbee usb stick instead of using the ikea tradfri hub?

I would definitely give it a go and see if it suits your needs. I use the Tradfri at home and the Conbee in a summer house. The Conbee often loses connection to the nodes but the gateway at home works with very few interruptions. But others might have a different experience. I think it depends a lot
on your environment.

I

Theoretical question:
Would it be possible/more viable to have some kind of home automation software with a zigbee usb stick instead of using the ikea tradfri hub?

You can access most (I do not have all flavours, so no guarantees) of the IKEA devices with a ZigBee usb stick, however IKEA seems to have added some proprietary functions, which the gateway uses, at least I see unknown messages.

I tried to bump the aiocoap library to 0.4.1, and instead of breaking down every 3-4 hours, it has now been running for around 20 hours. I am currently attacking the pytradfri library to understand the code, and add some debug. I believe I have found 2 places where "await" is missing, and thus the function return is somewhat random.

aiocoap library to 0.4.1

that's the version i have been using all along

According to this PR there are shutdown problems in 0.4.1:
#314
I can see there are a PR solving this issue (which works for me) but it seems aiocap is very slow in accepting PRs

I made a quick fix to at least secure the library does not break down, when the raw device isn't present.

I have been playing with the observe command as well, and I can see the messages from the IKEA app is slightly different than when using this lib.

In order to test I made a setup with a ZigBee stick and direct connection to:

  • light bulbs (E27)
  • moving sensor
  • Coffee button
    Everything works nicely (except the coffee button, which lack some functionality), and I can flood messages without problems, so it is the gateway the are the bottleneck.

I acquired a Lidl Silverlight Zigbee Gateway and integrated it into Home Assistant, which i installed on the Raspberry Pi 3+., my python scripts were running already.
I moved all my IKEA devices over and stopped using the IKEA gateway.

Advantages:
No regular service intervals with inaccessibility of the gateway
my spare blinds button can now be used for anything

Disadvantages:
probably will stop suing my MQTT android app, as home assistant does not broadcast anything to mqtt, which would require more automations just to send the messages
home assistant is much more power hungry than my 5 python scripts i used before on the raspberry
i will watch how this performs in the future

But i still have the weird behavior that the zigbee entities get invisible/unresponsible after a few days

@hmax42 this is off-topic for this issue. Open an issue on integrations/libraries that have a problem.

@janiversen true, but i wanted to add it, because
a) this observation was part of the discussion
b) i can now definitly say that this behavior has nothing to do with the IKEA gateway, because it is off.

update
i have to admit a mistake on my side: the gateway was offline

Well it is known that zigbee devices sleeps and looses connection from time to time, especially the battery powered ones, but also e.g. lights (especially if you turn the power off). So whatever you use, should do an automatic reconnect. We will be introducing that in tradfri soon.

should this issue be closed?
it was clearly analysed that the lib has no influence on the "housekeeping"/whatever of the IKEA gateway.

thank you to all who have contributed here, even if there is no solution