sysrepo / sysrepo

YANG-based configuration and operational state data store for Unix/Linux applications

Home Page:http://www.sysrepo.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

double lock when running sr_install_module on QNX

amamory opened this issue · comments

The Issue:

Assuming and AMD platform with a QNX 710 OS. When trying to install a custom YANG module with "sysrepoctl -i" i get the message
"Initializing pthread mutex failed", which comes from the function "_sr_mutex_init".

How to reproduce it:
We are using Sysrepo version 2.2.60. Then we do:

  • apply the attached qnx.patch;
  • print the lock memory address in the function _sr_mutex_init:
    • printf ("\n\nXXXXXXXXXXXXXX lock: %p, %d, %d\n\n", (void *)lock, shared, robust);
  • run sysrepoctl -i

By using the prints, you will see that during the YANG installation, a lock is called twice, which seems an undefined behavior. In Linux this issue does not have any consequence ... it completes the operation but it seems a silent error. But the issue becomes apparent when running on QNX.

Here is the stack trace. Only the levels 0 to 7 are relevant:

#0  0x0000000101a6c9b8 in _sr_mutex_init (lock=0x180003008, shared=1, robust=1)
    at /home/whc9fe/.conan/data/sysrepo/2.2.60-stable+local.0/adas/master/build/10438606b010765bcae3150943c1ab0b575697e3/src/common.c:2228
#1  0x0000000101a6ca5b in sr_mutex_init (lock=0x180003008, shared=1)
    at /home/whc9fe/.conan/data/sysrepo/2.2.60-stable+local.0/adas/master/build/10438606b010765bcae3150943c1ab0b575697e3/src/common.c:2245
#2  0x0000000101a6cc5c in sr_rwlock_init (rwlock=0x180003008, shared=1)
    at /home/whc9fe/.conan/data/sysrepo/2.2.60-stable+local.0/adas/master/build/10438606b010765bcae3150943c1ab0b575697e3/src/common.c:2299
#3  0x0000000101a9dc8e in sr_shmmod_fill (shm_mod=0xa1ebd80, shm_mod_idx=0, sr_mod=0xa247630, old_smod=0xa218048)
    at /home/whc9fe/.conan/data/sysrepo/2.2.60-stable+local.0/adas/master/build/10438606b010765bcae3150943c1ab0b575697e3/src/shm_mod.c:163
#4  0x0000000101aa043b in sr_shmmod_store_modules (shm_mod=0xa1ebd80, sr_mods=0xa2475d0)
    at /home/whc9fe/.conan/data/sysrepo/2.2.60-stable+local.0/adas/master/build/10438606b010765bcae3150943c1ab0b575697e3/src/shm_mod.c:792
#5  0x0000000101a5763e in _sr_install_modules (conn=0xa1ebc80, search_dirs=0x9c9ce98 "/root/models/", data=0x0, data_path=0x0,
    format=LYD_UNKNOWN, new_mods=0x80c8240, new_mod_count=0x80c8230)
{}Type <RET> for more, q to quit, c to continue without paging{}
    at /home/whc9fe/.conan/data/sysrepo/2.2.60-stable+local.0/adas/master/build/10438606b010765bcae3150943c1ab0b575697e3/src/sysrepo.c:1404
#6  0x0000000101a5796e in sr_install_module2 (conn=0xa1ebc80, schema_path=0x9ca9162 "/root/models//examples_1.yang",
    search_dirs=0x9c9ce98 "/root/models/", features=0x0, module_ds=0x0, owner=0x0, group=0x0, perm=0, data=0x0, data_path=0x0,
    format=LYD_UNKNOWN)
    at /home/whc9fe/.conan/data/sysrepo/2.2.60-stable+local.0/adas/master/build/10438606b010765bcae3150943c1ab0b575697e3/src/sysrepo.c:1469
#7  0x0000000101a577cd in sr_install_module (conn=0xa1ebc80, schema_path=0x9ca9162 "/root/models//examples_1.yang",
    search_dirs=0x9c9ce98 "/root/models/", features=0x0)
    at /home/whc9fe/.conan/data/sysrepo/2.2.60-stable+local.0/adas/master/build/10438606b010765bcae3150943c1ab0b575697e3/src/sysrepo.c:1438
#8  0x0000000009189920 in tests::tests_utils::SysrepoModulesSetup::install_modules (this=0xa1cc768, mod_names=...)
    at /home/whc9fe/DEV/getk-gnmi/src/gnmi/unit-tests/utils/sysrepo_setup.hpp:225
#9  0x000000000918910e in tests::tests_utils::SysrepoModulesSetup::setup_test_modules (this=0xa1cc768, mod_names=...)
    at /home/whc9fe/DEV/getk-gnmi/src/gnmi/unit-tests/utils/sysrepo_setup.hpp:166
#10 0x000000000918c82f in tests::sysrepo_agent_tests::SysrepoAgentTest::SetUp (this=0xa1cc750)
    at /home/whc9fe/DEV/getk-gnmi/src/gnmi/unit-tests/../agent/unit-tests/sysrepo_agent_tests/sysrepo_agent_tests.hpp:57

We applied a 2nd patch to destroy the lock before init it again, and it QNX completes the operation

diff --git a/src/common.c b/src/common.c
index 232808d..5f7f4de 100644
--- a/src/common.c
+++ b/src/common.c
@@ -2222,9 +2222,12 @@ _sr_mutex_init(pthread_mutex_t *lock, int shared, int robust)
         }

         if ((ret = pthread_mutex_init(lock, &attr))) {
-            pthread_mutexattr_destroy(&attr);
-            sr_errinfo_new(&err_info, SR_ERR_SYS, "Initializing pthread mutex failed (%s).", strerror(ret));
-            return err_info;
+            pthread_mutex_destroy(lock);
+            if ((ret = pthread_mutex_init(lock, &attr))) {
+                pthread_mutexattr_destroy(&attr);
+                sr_errinfo_new(&err_info, SR_ERR_SYS, "Initializing pthread mutex failed %p attr set %p(%s).", lock, &attr, strerror(ret));
+                return err_info;
+            }
         }
         pthread_mutexattr_destroy(&attr);
     } else {

This 2nd patch kind of proves that it's a double mutex_init is causing the issue. In the end, the QNX behavior is the expected one. The real issue is that Linux runs it silently.

final remarks:
I know it's not the latest version of Sysrepo. But did you have any mutex related fix in the past months that could justify redoing all this process on the latest version ? It would take a while to do so ... redo the patches, test it on different platforms. If possible, it would be better to stay in this older version. Unless you have a very strong indication that this issue was already solved in the newest version.

any thoughts ?

best regards,

I have added an address print and got this output:

[WRN] ## mutex init 0x16632f0
[WRN] ## mutex init 0x1663328
[WRN] ## mutex init 0x16633d8
[WRN] ## mutex init 0x1663478
[WRN] ## mutex init 0x1663540
[WRN] ## mutex init 0x16635e8
[WRN] ## mutex init 0x7f050f007008
[WRN] ## mutex init 0x7f050f007030
[WRN] ## mutex init 0x7f050f0070b8
[WRN] ## mutex init 0x7f050ee57008
[WRN] ## mutex init 0x7f050ee57090
[WRN] ## mutex init 0x7f050ee570d8
[WRN] ## mutex init 0x7f050ee57160
[WRN] ## mutex init 0x7f050ee571a8
[WRN] ## mutex init 0x7f050ee57230
[WRN] ## mutex init 0x7f050ee57278
[WRN] ## mutex init 0x7f050ee57300
[WRN] ## mutex init 0x7f050ee57348
[WRN] ## mutex init 0x7f050ee57470
[WRN] ## mutex init 0x7f050ee57508
[WRN] ## mutex init 0x7f050ee575a0
[WRN] ## mutex init 0x7f050ee57638
[WRN] ## mutex init 0x7f050ee576d0
[WRN] ## mutex init 0x7f050ee57768
[WRN] ## mutex init 0x7f050ee57800
[WRN] ## mutex init 0x7f050ee57930
[WRN] ## mutex init 0x7f050ee579b8
[WRN] ## mutex init 0x7f050ee57a00
[WRN] ## mutex init 0x7f050ee57a88
[WRN] ## mutex init 0x7f050ee57ad0
[WRN] ## mutex init 0x7f050ee57b58
[WRN] ## mutex init 0x7f050ee57ba0
[WRN] ## mutex init 0x7f050ee57c28
[WRN] ## mutex init 0x7f050ee57c70
[WRN] ## mutex init 0x7f050ee57d98
[WRN] ## mutex init 0x7f050ee57e30
[WRN] ## mutex init 0x7f050ee57ec8
[WRN] ## mutex init 0x7f050ee57f60
[WRN] ## mutex init 0x7f050ee57ff8
[WRN] ## mutex init 0x7f050ee58090
[WRN] ## mutex init 0x7f050ee58128
[WRN] ## mutex init 0x7f050ee58258
[WRN] ## mutex init 0x7f050ee582e0
[WRN] ## mutex init 0x7f050ee58328
[WRN] ## mutex init 0x7f050ee583b0
[WRN] ## mutex init 0x7f050ee583f8
[WRN] ## mutex init 0x7f050ee58480
[WRN] ## mutex init 0x7f050ee584c8
[WRN] ## mutex init 0x7f050ee58550
[WRN] ## mutex init 0x7f050ee58598
[WRN] ## mutex init 0x7f050ee586c0
[WRN] ## mutex init 0x7f050ee58758
[WRN] ## mutex init 0x7f050ee587f0
[WRN] ## mutex init 0x7f050ee58888
[WRN] ## mutex init 0x7f050ee58920
[WRN] ## mutex init 0x7f050ee589b8
[WRN] ## mutex init 0x7f050ee58a50
[WRN] ## mutex init 0x7f050ee58b80
[WRN] ## mutex init 0x7f050ee58c08
[WRN] ## mutex init 0x7f050ee58c50
[WRN] ## mutex init 0x7f050ee58cd8
[WRN] ## mutex init 0x7f050ee58d20
[WRN] ## mutex init 0x7f050ee58da8
[WRN] ## mutex init 0x7f050ee58df0
[WRN] ## mutex init 0x7f050ee58e78
[WRN] ## mutex init 0x7f050ee58ec0
[WRN] ## mutex init 0x7f050ee58fe8
[WRN] ## mutex init 0x7f050ee59080
[WRN] ## mutex init 0x7f050ee59118
[WRN] ## mutex init 0x7f050ee591b0
[WRN] ## mutex init 0x7f050ee59248
[WRN] ## mutex init 0x7f050ee592e0
[WRN] ## mutex init 0x7f050ee59378
[WRN] ## mutex init 0x7f050ee594a8
[WRN] ## mutex init 0x7f050ee59530
[WRN] ## mutex init 0x7f050ee59578
[WRN] ## mutex init 0x7f050ee59600
[WRN] ## mutex init 0x7f050ee59648
[WRN] ## mutex init 0x7f050ee596d0
[WRN] ## mutex init 0x7f050ee59718
[WRN] ## mutex init 0x7f050ee597a0
[WRN] ## mutex init 0x7f050ee597e8
[WRN] ## mutex init 0x7f050ee59910
[WRN] ## mutex init 0x7f050ee599a8
[WRN] ## mutex init 0x7f050ee59a40
[WRN] ## mutex init 0x7f050ee59ad8
[WRN] ## mutex init 0x7f050ee59b70
[WRN] ## mutex init 0x7f050ee59c08
[WRN] ## mutex init 0x7f050ee59ca0
[WRN] ## mutex init 0x7f050ee59dd0
[WRN] ## mutex init 0x7f050ee59e58
[WRN] ## mutex init 0x7f050ee59ea0
[WRN] ## mutex init 0x7f050ee59f28
[WRN] ## mutex init 0x7f050ee59f70
[WRN] ## mutex init 0x7f050ee59ff8
[WRN] ## mutex init 0x7f050ee5a040
[WRN] ## mutex init 0x7f050ee5a0c8
[WRN] ## mutex init 0x7f050ee5a110
[WRN] ## mutex init 0x7f050ee5a238
[WRN] ## mutex init 0x7f050ee5a2d0
[WRN] ## mutex init 0x7f050ee5a368
[WRN] ## mutex init 0x7f050ee5a400
[WRN] ## mutex init 0x7f050ee5a498
[WRN] ## mutex init 0x7f050ee5a530
[WRN] ## mutex init 0x7f050ee5a5c8
[WRN] ## mutex init 0x7f050ee5a6f8
[WRN] ## mutex init 0x7f050ee5a780
[WRN] ## mutex init 0x7f050ee5a7c8
[WRN] ## mutex init 0x7f050ee5a850
[WRN] ## mutex init 0x7f050ee5a898
[WRN] ## mutex init 0x7f050ee5a920
[WRN] ## mutex init 0x7f050ee5a968
[WRN] ## mutex init 0x7f050ee5a9f0
[WRN] ## mutex init 0x7f050ee5aa38
[WRN] ## mutex init 0x7f050ee5ab60
[WRN] ## mutex init 0x7f050ee5abf8
[WRN] ## mutex init 0x7f050ee5ac90
[WRN] ## mutex init 0x7f050ee5ad28
[WRN] ## mutex init 0x7f050ee5adc0
[WRN] ## mutex init 0x7f050ee5ae58
[WRN] ## mutex init 0x7f050ee5aef0
[WRN] ## mutex init 0x7f050ee5b020
[WRN] ## mutex init 0x7f050ee5b0a8
[WRN] ## mutex init 0x7f050ee5b0f0
[WRN] ## mutex init 0x7f050ee5b178
[WRN] ## mutex init 0x7f050ee5b1c0
[WRN] ## mutex init 0x7f050ee5b248
[WRN] ## mutex init 0x7f050ee5b290
[WRN] ## mutex init 0x7f050ee5b318
[WRN] ## mutex init 0x7f050ee5b360
[WRN] ## mutex init 0x7f050ee5b488
[WRN] ## mutex init 0x7f050ee5b520
[WRN] ## mutex init 0x7f050ee5b5b8
[WRN] ## mutex init 0x7f050ee5b650
[WRN] ## mutex init 0x7f050ee5b6e8
[WRN] ## mutex init 0x7f050ee5b780
[WRN] ## mutex init 0x7f050ee5b818
[WRN] ## mutex init 0x7f050ee5b948
[WRN] ## mutex init 0x7f050ee5b9d0
[WRN] ## mutex init 0x7f050ee5ba18
[WRN] ## mutex init 0x7f050ee5baa0
[WRN] ## mutex init 0x7f050ee5bae8
[WRN] ## mutex init 0x7f050ee5bb70
[WRN] ## mutex init 0x7f050ee5bbb8
[WRN] ## mutex init 0x7f050ee5bc40
[WRN] ## mutex init 0x7f050ee5bc88
[WRN] ## mutex init 0x7f050ee5bdb0
[WRN] ## mutex init 0x7f050ee5be48
[WRN] ## mutex init 0x7f050ee5bee0
[WRN] ## mutex init 0x7f050ee5bf78
[WRN] ## mutex init 0x7f050ee5c010
[WRN] ## mutex init 0x7f050ee5c0a8
[WRN] ## mutex init 0x7f050ee5c140
[WRN] ## mutex init 0x7f050ee5c270
[WRN] ## mutex init 0x7f050ee5c2f8
[WRN] ## mutex init 0x7f050ee5c340
[WRN] ## mutex init 0x7f050ee5c3c8
[WRN] ## mutex init 0x7f050ee5c410
[WRN] ## mutex init 0x7f050ee5c498
[WRN] ## mutex init 0x7f050ee5c4e0
[WRN] ## mutex init 0x7f050ee5c568
[WRN] ## mutex init 0x7f050ee5c5b0
[WRN] ## mutex init 0x7f050ee5c6d8
[WRN] ## mutex init 0x7f050ee5c770
[WRN] ## mutex init 0x7f050ee5c808
[WRN] ## mutex init 0x7f050ee5c8a0
[WRN] ## mutex init 0x7f050ee5c938
[WRN] ## mutex init 0x7f050ee5c9d0
[WRN] ## mutex init 0x7f050ee5ca68
[WRN] ## mutex init 0x7f050ee5cb98
[WRN] ## mutex init 0x7f050ee5cc20
[WRN] ## mutex init 0x7f050ee5cc68
[WRN] ## mutex init 0x7f050ee5ccf0
[WRN] ## mutex init 0x7f050ee5cd38
[WRN] ## mutex init 0x7f050ee5cdc0
[WRN] ## mutex init 0x7f050ee5ce08
[WRN] ## mutex init 0x7f050ee5ce90
[WRN] ## mutex init 0x7f050ee5ced8
[WRN] ## mutex init 0x7f050ee5d000
[WRN] ## mutex init 0x7f050ee5d098
[WRN] ## mutex init 0x7f050ee5d130
[WRN] ## mutex init 0x7f050ee5d1c8
[WRN] ## mutex init 0x7f050ee5d260
[WRN] ## mutex init 0x7f050ee5d2f8
[WRN] ## mutex init 0x7f050ee5d390
[WRN] ## mutex init 0x7f050ee5d4c0
[WRN] ## mutex init 0x7f050ee5d548
[WRN] ## mutex init 0x7f050ee5d590
[WRN] ## mutex init 0x7f050ee5d618
[WRN] ## mutex init 0x7f050ee5d660
[WRN] ## mutex init 0x7f050ee5d6e8
[WRN] ## mutex init 0x7f050ee5d730
[WRN] ## mutex init 0x7f050ee5d7b8
[WRN] ## mutex init 0x7f050ee5d800
[WRN] ## mutex init 0x7f050ee5d928
[WRN] ## mutex init 0x7f050ee5d9c0
[WRN] ## mutex init 0x7f050ee5da58
[WRN] ## mutex init 0x7f050ee5daf0
[WRN] ## mutex init 0x7f050ee5db88
[WRN] ## mutex init 0x7f050ee5dc20
[WRN] ## mutex init 0x7f050ee5dcb8
[WRN] ## mutex init 0x7f050ee5dde8
[WRN] ## mutex init 0x7f050ee5de70
[WRN] ## mutex init 0x7f050ee5deb8
[WRN] ## mutex init 0x7f050ee5df40
[WRN] ## mutex init 0x7f050ee5df88
[WRN] ## mutex init 0x7f050ee5e010
[WRN] ## mutex init 0x7f050ee5e058
[WRN] ## mutex init 0x7f050ee5e0e0
[WRN] ## mutex init 0x7f050ee5e128
[WRN] ## mutex init 0x7f050ee5e250
[WRN] ## mutex init 0x7f050ee5e2e8
[WRN] ## mutex init 0x7f050ee5e380
[WRN] ## mutex init 0x7f050ee5e418
[WRN] ## mutex init 0x7f050ee5e4b0
[WRN] ## mutex init 0x7f050ee5e548
[WRN] ## mutex init 0x7f050ee5e5e0
[WRN] ## mutex init 0x7f050ee5ed58
[WRN] ## mutex init 0x7f050ee5df60
[WRN] ## mutex init 0x7f050ee5e020
[WRN] ## mutex init 0x7f050ee5e0e0
[WRN] ## mutex init 0x7f050ee5e1a0
[WRN] ## mutex init 0x7f050ee5e260
[WRN] ## mutex init 0x7f050ee5e320
[WRN] ## mutex init 0x7f050ee5e3e0
[WRN] ## mutex init 0x7f050ee5e4a0
[WRN] ## mutex init 0x7f050ee5e560

which is definitely more than I expected but found no duplicates. So, I am not sure what else to do. I have used the latest version and yours is fairly old so it is possible it has been fixed but I have tried it with v2.2.60 and found no duplicates either.

Looking at your patch, I see that you are manually telling the compiler to align the mutexes. There is a check for that in the code and the structures should automatically satisfy this. I am not sure if this cannot cause some problems, generally the SHM data are managed automatically and this way the compiler steps in somehow.

thanks so much for the quick feedback !
I will try to remove the mutex alignment to see if it will make any difference.

I'll let you know soon

I remove the mutex alignment and the error now moves to this part of the function "_sr_mutex_init". So, it fails before hitting the double mutex init part.

if (SR_MUTEX_ALIGN_CHECK(lock)) {
   sr_errinfo_new(&err_info, SR_ERR_INTERNAL, "Mutex address not aligned.");
   return err_info;

}

The error only occurs in QNX. In linux it runs ok even without alignment.
I'll consider to upgrade Sysrepo, but it will take time. You can close this issue and if (after the upgrade) i see the same error, i get back to this issue.

thank you !