AVSystem / Anjay-esp32-client

Anjay ESP-32 LwM2M client

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Anjay-esp32-client stop working after registration update failure

AxelLin opened this issue · comments

I hit registration update failure when there is a temporary network interface down, the Anjay-esp32-client stop working.

Below is the error mesages I observed:
E (105533) anjay: ERROR [avs_net] [/home/axel/Anjay-esp32-client/main/anjay/deps/avs_commons/src/net/compat/posix/avs_net_impl.c:1198]: send failed
W (105544) anjay: WARNING [anjay] [/home/axel/Anjay-esp32-client/main/anjay/src/core/servers/anjay_register.c:878]: failure while receiving Update response: No route to host
E (105562) anjay: ERROR [anjay] [/home/axel/Anjay-esp32-client/main/anjay/src/core/servers/anjay_register.c:844]: could not send registration update for SSID==1: 2

It then stop working after above messages.
The point is that it does not recover after the network interface is back.
It looks like anjay_event_loop_run does not consider this as fatal error
so the application does not know how to handle such error.

Hi!

We have noticed this issue, currently we have a fix in our internal repository, it will be addressed in next release in the following days.

Could you confirm if the issue is resolved?

Hi,

  1. Test with ethernet, my observation is that it enters offline but never exit offline.
    i.e. It's complete broken now with ethernet.

  2. Test with wifi. (Try disconnect from wifi for 10 seconds, then reconnect to wifi)
    My debug print shows it enters offline and then exits offline when wifi is backed.
    But it still does not work even after exit offline.
    Seems the UDP socket is deleted, "anjay/src/core/servers/anjay_reload.c:176]: servers reloaded" does not help.

BTW, I'd appreciate if you commit bug fix in a separate commit.

Just check the commit log and realize you guys commit a lot of changes in single commit.
That does not make sense and it is difficult to figure out what was changed/broken/fixed.
The tags are useless if you tag each commit.

Hi,

This is because we do most of the development on our internal repositories which include features that are only available commercially. The code is then post-processed for public open source releases. Unfortunately that makes it infeasible to publish the entire commit history, so the changes between releases are squashed into single commits for the open source repositories.

I'm sorry for this inconvenience.

Just curious if the "stop working after registration update failure" issue only happens on Anjay-esp32-client
or it also happens on Anjay-freertos-client and Anjay-mbedos-client?
I ask this because I don't find similar fixup in Anjay-freertos-client and Anjay-mbedos-client.

Hi,
This has been broken for 2 months, just wondering if this will be fixed soon.
Note, the 22.01.1 is worsen than 22.01 becasue it is complete broken if using ethernet (see #2 (comment))

Hi,

Sorry for not responding for so long.

We are currently not targeting the Ethernet interface - in fact we don't have any boards with a proper Ethernet port, so this will not work at the moment. Feel free to contribute support for it if you need it.

As far as the broken connection issue goes, my suspicion is that the library is flagging the connection as failed and expecting the client application to react. You can add handling of this case using e.g. code such as this:

diff --git a/main/main.c b/main/main.c
index 9468ae3..9e0f4e4 100644
--- a/main/main.c
+++ b/main/main.c
@@ -155,6 +155,10 @@ static void update_connection_status_job(avs_sched_t *sched,
         connected_prev = true;
     }
 
+    if (anjay_all_connections_failed(anjay)) {
+        anjay_transport_schedule_reconnect(anjay);
+    }
+
     AVS_SCHED_DELAYED(sched, NULL, avs_time_duration_from_scalar(1, AVS_TIME_S),
                       update_connection_status_job, &anjay, sizeof(anjay));
 }

I hope that this will work for you.

Please note that is project is intended more as an example and demonstration of library usage than as something ready for use in the field, so we do not consider this a bug. Different users may have different requirements when it comes to reconnecting after a hard failure like this (immediately vs. after a predefined time vs. exponential backoff etc.), that's why this is not done automatically by the library.

@kFYatek

  1. This indeed a bug with etherenet interface, especially you have a config option to use ethernet interface.
  2. The recent change of using esp_wifi_sta_get_ap_info() is wrong if you consider the etherent interface.
    Anyway, it does not work well even with wifi interface as I reported. I think that change needs to be reverted.
  3. Above mentioned changes to add anjay_transport_schedule_reconnect() works.

BTW, this project failed to compile with current esp-idf master tree now. Just FYI.
I notice the avs_commons still use MBEDTLS_PRIVATE which is likely to break in a future minor version of Mbed TLS.
Link: https://github.com/Mbed-TLS/mbedtls/blob/development/docs/3.0-migration-guide.md#most-structure-fields-are-now-private

@AxelLin I'm glad that the change works for you.

This indeed a bug with etherenet interface, especially you have a config option to use ethernet interface.

The option is in the config UI because we used the example app framework in the initial version. This will be removed in the upcoming release, which will only support WiFi.

BTW, this project failed to compile with current esp-idf master tree now. Just FYI.

The current version is tested using ESP-IDF 4.3, and the upcoming release will be targeting ESP-IDF 4.4. Our goal is to support the latest stable release, not necessarily the latest master tree.

I notice the avs_commons still use MBEDTLS_PRIVATE which is likely to break in a future minor version of Mbed TLS.

Yes, we are aware that this is a hack. The current version indeed does not work with Mbed TLS 3.1, as it made some previously private fields public again - we have a fix for that in our internal branch, which will be released shortly. However, some functionality is still missing from the public API, so that's why we couldn't remove the usage of MBEDTLS_PRIVATE even for Mbed TLS 3.1. We plan on regularly updating avs_commons to support any upcoming Mbed TLS releases. However, according to our outlook, the adoption of Mbed TLS 3.x remains low for now, so we don't see it as an absolute priority.

In any case, thank you for all the comments and suggestions!

@AxelLin I'm glad that the change works for you.

This indeed a bug with etherenet interface, especially you have a config option to use ethernet interface.

The option is in the config UI because we used the example app framework in the initial version. This will be removed in the upcoming release, which will only support WiFi.

I do hope you keep the ethernet config option and it actually works.
(I don't see any good reason to remove it since lwm2m should work with either wifi or ethernet).

BTW, now I upgrade to use Anjay 2.15.0.
By using anjay_event_loop_run_with_error_handling() it looks fine with both wifi and etherent.