mozilla / neqo

Neqo, the Mozilla Firefox implementation of QUIC in Rust

Home Page:https://firefox-source-docs.mozilla.org/networking/http/http3.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crashing in nss init test

martinthomson opened this issue · comments

I'm not sure if this is misuse of NSS APIs or a bug in NSS initialization, but this is happening quite a bit to me. I don't have time right now to investigate.

     Running tests/init.rs (target/debug/deps/init-18835beddefbfa4e)

running 2 tests
Assertion failure: lock != NULL, at ../../../../pr/src/pthreads/ptsynch.c:175
test init_withdb ... ok
error: test failed, to rerun pass `-p neqo-crypto --test init`

Caused by:
  process didn't exit successfully: `/home/martin/code/neqo/target/debug/deps/init-18835beddefbfa4e` (signal: 6, SIGABRT: process abort signal)
Stack trace
* thread #3, name = 'init_withdb', stop reason = signal SIGABRT
  * frame #0: 0x00007ffff7dba9fc libc.so.6`pthread_kill + 300
    frame #1: 0x00007ffff7d66476 libc.so.6`raise + 22
    frame #2: 0x00007ffff7d4c7f3 libc.so.6`abort + 211
    frame #3: 0x00007ffff7f6a513 libnspr4.so`PR_Assert(s="lock != NULL", file="../../../../pr/src/pthreads/ptsynch.c", ln=175) at prlog.c:571:5
    frame #4: 0x00007ffff7f8713a libnspr4.so`PR_Lock(lock=0x0000000000000000) at ptsynch.c:175:5
    frame #5: 0x00007ffff7f79f45 libnspr4.so`PR_CallOnce(once=0x00005555556d02b8, func=(init-d2141f227c08e7ce`nss_doLockInit at nssinit.c:534:1)) at prinit.c:774:5
    frame #6: 0x000055555562eb65 init-d2141f227c08e7ce`nss_Init(configdir="/home/martin/code/neqo/test-fixture/db", certPrefix="", keyPrefix="", secmodName="secmod.db", updateDir="", updCertPrefix="", updKeyPrefix="", updateID="", updateName="", initContextPtr=0x0000000000000000, initParams=0x0000000000000000, readOnly=1, noCertDB=0, noModDB=0, forceOpen=0, noRootInit=0, optimizeSpace=0, noSingleThreadedModules=0, allowAlreadyInitializedModules=0, dontFinalizeModules=0) at nssinit.c:580:9
    frame #7: 0x000055555562f1d9 init-d2141f227c08e7ce`NSS_Initialize(configdir="/home/martin/code/neqo/test-fixture/db", certPrefix="", keyPrefix="", secmodName="secmod.db", flags=1) at nssinit.c:889:12
    frame #8: 0x00005555555e36a2 init-d2141f227c08e7ce`neqo_crypto::init_db::_$u7b$$u7b$closure$u7d$$u7d$::h6cccd5c60e5c4ea4 at lib.rs:163:13
    frame #9: 0x00005555555e3d32 init-d2141f227c08e7ce`std::sync::once_lock::OnceLock$LT$T$GT$::get_or_init::_$u7b$$u7b$closure$u7d$$u7d$::h2550b0c13514ffb8 at once_lock.rs:250:50
    frame #10: 0x00005555555e3c0b init-d2141f227c08e7ce`std::sync::once_lock::OnceLock$LT$T$GT$::initialize::_$u7b$$u7b$closure$u7d$$u7d$::h87f60e1ed9e123a1(p=0x00007ffff7a15110) at once_lock.rs:376:19
    frame #11: 0x00005555555e4167 init-d2141f227c08e7ce`std::sync::once::Once::call_once_force::_$u7b$$u7b$closure$u7d$$u7d$::h146f8429fe820fd0(p=0x00007ffff7a15110) at once.rs:208:40
    frame #12: 0x00005555555e489a init-d2141f227c08e7ce`std::sys_common::once::futex::Once::call::hfa47d030df88ef30(self=0x00005555556d0200, ignore_poisoning=true, f=0x00007ffff7a152a8) at futex.rs:124:21
    frame #13: 0x00005555555e4037 init-d2141f227c08e7ce`std::sync::once::Once::call_once_force::h3e05b6c06111e63d(self=0x00005555556d0200, f={closure_env#0}<core::result::Result<neqo_crypto::NssLoaded, neqo_crypto::err::Error>, std::sync::once_lock::{impl#0}::get_or_init::{closure_env#0}<core::result::Result<neqo_crypto::NssLoaded, neqo_crypto::err::Error>, neqo_crypto::init_db::{closure_env#0}<&str>>, !> @ 0x00007ffff7a152f8) at once.rs:208:9
    frame #14: 0x00005555555e3b99 init-d2141f227c08e7ce`std::sync::once_lock::OnceLock$LT$T$GT$::initialize::h2a16e900f7cbd97b(self=0x00005555556d01c8, f={closure_env#0}<core::result::Result<neqo_crypto::NssLoaded, neqo_crypto::err::Error>, neqo_crypto::init_db::{closure_env#0}<&str>> @ 0x00007ffff7a15320) at once_lock.rs:375:9
    frame #15: 0x00005555555e3e04 init-d2141f227c08e7ce`std::sync::once_lock::OnceLock$LT$T$GT$::get_or_try_init::hf774e982ab37704a(self=0x00005555556d01c8, f={closure_env#0}<core::result::Result<neqo_crypto::NssLoaded, neqo_crypto::err::Error>, neqo_crypto::init_db::{closure_env#0}<&str>> @ 0x00007ffff7a15398) at once_lock.rs:298:9
    frame #16: 0x00005555555e3cfd init-d2141f227c08e7ce`std::sync::once_lock::OnceLock$LT$T$GT$::get_or_init::hba25c0f3d5f8da29(self=0x00005555556d01c8, f={closure_env#0}<&str> @ 0x00007ffff7a15400) at once_lock.rs:250:15
    frame #17: 0x00005555555e309d init-d2141f227c08e7ce`neqo_crypto::init_db::hf7895a0b6d64f1f4(dir=(data_ptr = "/home/martin/code/neqo/test-fixture/dbneqo-crypto/tests/init.rs", length = 38)) at lib.rs:149:15
    frame #18: 0x00005555555e22be init-d2141f227c08e7ce`init::init_withdb::h1911f7a1b0bd088f at init.rs:47:5
    frame #19: 0x00005555555e1c87 init-d2141f227c08e7ce`init::init_withdb::_$u7b$$u7b$closure$u7d$$u7d$::hd4e1bf586e84a384((null)=0x00007ffff7a155d6) at init.rs:46:17

Interesting! I don't hit this when I do cargo nextest run (which I normally use), but I can reproduce with cargo test.

nextest might not run the two tests concurrently. I get the same running the test binary directly. It seems to be down to a race or contention between concurrent attempts to initialize.

I took a brief look at the NSS initialization and there are a few things that might point at problems. For instance, a lack of locking on NSS_IsInitialized is concerning (I can guess why, but it's blatantly unsafe). The fact that most of the routine is not covered by a mutex is also a little worrying. Neither immediately points to the problem, but it probably needs some investigation (it might be the true source of our NSS initialization woes, you know the ones from #482).

It seems to be failing in the twice tests I added as part of 3151adc. At least if I comment those out, I don't get the failure anymore.

Specifically, init_twice_withdb.