obgm / libcoap

A CoAP (RFC 7252) implementation in C

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Do not destroy mutexes in `coap_cleanup()`

hasheddan opened this issue · comments

Environment

  • Build System: CMake
  • Operating System: Ubuntu
  • Operating System Version: 20.04
  • Hosted Environment: ESP-IDF

libcoap Configuration Summary

Configuration can be found here: https://github.com/golioth/golioth-firmware-sdk/pull/163/files#diff-152d8027819ee12f017ab0edd01f2b2f218d24b7f751070b7d130842be298a3a

Problem Description

Tests run here reveal a null queue when acquiring a semaphore (see assert here). The decoded stack trace looks as follows:

0x4037b36f: xt_utils_compare_and_set at /home/hasheddan/code/github.com/espressif/esp-idf/components/xtensa/include/xt_utils.h:215
 (inlined by) esp_cpu_compare_and_set at /home/hasheddan/code/github.com/espressif/esp-idf/components/esp_hw_support/cpu.c:483
0x40380bf9: spinlock_acquire at /home/hasheddan/code/github.com/espressif/esp-idf/components/esp_hw_support/include/spinlock.h:121
 (inlined by) xPortEnterCriticalTimeout at /home/hasheddan/code/github.com/espressif/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:501
0x4037e4b4: vPortEnterCritical at /home/hasheddan/code/github.com/espressif/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/include/freertos/portmacro.h:584
 (inlined by) xQueueSemaphoreTake at /home/hasheddan/code/github.com/espressif/esp-idf/components/freertos/FreeRTOS-Kernel/queue.c:1671
0x4200390d: pthread_mutex_lock_internal at /home/hasheddan/code/github.com/espressif/esp-idf/components/pthread/pthread.c:614
0x42003a9a: pthread_mutex_lock at /home/hasheddan/code/github.com/espressif/esp-idf/components/pthread/pthread.c:644
0x420189c6: coap_log_impl at /home/hasheddan/code/github.com/golioth/golioth-firmware-sdk/external/libcoap/src/coap_debug.c:1199
0x4201b653: setup_client_ssl_session at /home/hasheddan/code/github.com/golioth/golioth-firmware-sdk/external/libcoap/src/coap_mbedtls.c:1151 (discriminator 1)
0x4201b7e5: coap_dtls_new_mbedtls_env at /home/hasheddan/code/github.com/golioth/golioth-firmware-sdk/external/libcoap/src/coap_mbedtls.c:1518
0x4201bb97: coap_dtls_new_client_session at /home/hasheddan/code/github.com/golioth/golioth-firmware-sdk/external/libcoap/src/coap_mbedtls.c:1830
0x42022aaa: coap_dtls_establish at /home/hasheddan/code/github.com/golioth/golioth-firmware-sdk/external/libcoap/src/coap_dtls.c:25
0x42020b3b: coap_session_check_connect at /home/hasheddan/code/github.com/golioth/golioth-firmware-sdk/external/libcoap/src/coap_session.c:1248
0x4202202a: coap_new_client_session_psk2 at /home/hasheddan/code/github.com/golioth/golioth-firmware-sdk/external/libcoap/src/coap_session.c:1370 (discriminator 3)
0x42010921: create_session at /home/hasheddan/code/github.com/golioth/golioth-firmware-sdk/src/golioth_coap_client.c:601 (discriminator 85)
0x4201223a: golioth_coap_client_thread at /home/hasheddan/code/github.com/golioth/golioth-firmware-sdk/src/golioth_coap_client.c:894
0x403809d1: vPortTaskWrapper at /home/hasheddan/code/github.com/espressif/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:162

Expected Behavior

To not have the assert fail.

Actual Behavior

The failing assert.

Steps to reproduce

See previously linked source code and test run here: https://github.com/golioth/golioth-firmware-sdk/actions/runs/6115076185/job/16598003472#step:9:443

Code to reproduce this issue

I believe the issue here is that we are calling coap_cleanup(), then attempting to create a new session. In cd7b5de coap_cleanup() was modified to destroy mutexes. Unfortunately, we cannot re-initialize these mutexes because calling coap_startup() again will short-circuit due to coap_started being set to 1 on the first time it was called.

I have tested a patch where the mutex destroy block is eliminated and the tests are passing correctly again.

I have opened a fix in #1226.

#1226 is not the answer here.

coap_startup() should be called at the start of golioth_coap_client_thread() before the while(1). For example if any coap_log*() function is called before coap_startup() is called, then the mutex needed will fail. See coap_startup(3).

Note: coap_startup() is called in coap_new_context() 'just in case coap_startup() is not explicitly called'.

I'm not sure you need to call coap_cleanup() at all in your code, other than if create_context() or create_session() fail, at which point this goliath client thread is useless and everything should be cleaned up.

Thanks for the guidance here @mrdeep1! It looks like even prior to cd7b5de our use of coap_cleanup() was not the intended pattern. I'll close this out and hopefully it can get serve as a reference if others hit similar issues 👍🏻