Decryption support
furuholm opened this issue · comments
I have been working on adding support for decrypting messages and if it is ok I would like to submit a PR with this functionality once I am done.
I have an initial version that almost works at https://github.com/furuholm/libsignal-protocol-rs/tree/decryption. However, it turns out that the decryption code in the underlying C library requires the mutex implementation to be reentrant which parking_lot::RawMutex
is not. When replacing RawMutex
with ReentrantMutex
I get the following error message
error[E0277]: the type `std::cell::UnsafeCell<usize>` may contain interior mutability and a reference may not be safely transferrable across a catch_unwind boundary
--> libsignal-protocol/src/context.rs:410:17
|
410 | let _ = std::panic::catch_unwind(|| {
| ^^^^^^^^^^^^^^^^^^^^^^^^ `std::cell::UnsafeCell<usize>` may contain interior mutability and a reference may not be safely transferrable across a catch_unwind boundary
|
= help: within `&context::State`, the trait `std::panic::RefUnwindSafe` is not implemented for `std::cell::UnsafeCell<usize>`
= note: required because it appears within the type `std::cell::Cell<usize>`
= note: required because it appears within the type `lock_api::remutex::RawReentrantMutex<parking_lot::raw_mutex::RawMutex, parking_lot::remutex::RawThreadId>`
= note: required because it appears within the type `lock_api::remutex::ReentrantMutex<parking_lot::raw_mutex::RawMutex, parking_lot::remutex::RawThreadId, ()>`
= note: required because it appears within the type `context::State`
= note: required because it appears within the type `&context::State`
= note: required because of the requirements on the impl of `std::panic::UnwindSafe` for `&&context::State`
= note: required because it appears within the type `[closure@libsignal-protocol/src/context.rs:410:42: 413:10 state:&&context::State, level:&log::Level, message:&&str]`
Just like the message says it is caused by RawReentrantMutex
containing a cell::Cell
which is not RefUnwindSafe
.
Any ideas on how to solve this?
I have been working on adding support for decrypting messages and if it is ok I would like to submit a PR with this functionality once I am done.
That's awesome! I'd be happy to add more functionality to this crate.
Any ideas on how to solve this?
Looking at the docs for std::panic::UnwindSafe
, my interpretation is the RefUnwindSafe
bound exists as a speed bump so we don't accidentally break invariants by accessing data (in this case I'm guessing its some sort of recursion depth counter) when code panics.
The error message involves recursive locks so I'm assuming we could accidentally deadlock if we naively used std::panic::AssertUnwindSafe
to make the compiler think our mutex is RefUnwindSafe
.
You may want to make an issue against the parking_lot
crate directly and ask them how to handle unwinding while a re-entrant lock is held. Normally you'd use some sort of poisoning mechanism, but I don't know enough about the subtleties of locking to a solution.
The standard library's Mutex
implements poisoning, so they've manually implemented UnwindSafe
, maybe parking_lot needs to do that too?
Thanks for the feedback!
The solution was quite simple. I just moved the acquiring of log_func
out of the catch_unwind
scope. See line 411 here.
Once this was fixed I could replace the current mutex implementation with a reentrant mutex. This type only provided a RAII-based API. I put the lockguards in a Vec
to keep them alive until unlock
is called. Note that unlock
is not protected, but my thinking was that unlock
should never be invoked unless the lock has been aquired by lock
, so I think that should be ok.
That was the easy part. Testing this turned out to be much more challenging. I tried two approaches:
- Test
Context
directly: This turned out to be challenging as Rust does its best to stop us from sharing references between threads. The solution I came up with has a race condition that is caused by sharingContext
rather than the lock itself.
Note that I addedSync
andSend
implementations toContext
here to allow passing it between threads. This is something that needs to be to studied in more detail. - Test passing encrypted messages between two threads: This works but it is hard to say if it proves anything about the mutex impementation.
Note: to avoid addingSync
andSend
toPreKeyBundle
I implemented a wrapper (PreKeyBundleWrapper
). This should probably be cleaned up in some way before (potentially) adding this test to master.
I created a branch for each of the above since the first is broken and the second is kind of flawed.
- https://github.com/furuholm/libsignal-protocol-rs/tree/
- https://github.com/furuholm/libsignal-protocol-rs/tree/test_thread_decrypt
The reentrant mutex implementation is however pushed to https://github.com/furuholm/libsignal-protocol-rs/tree/decryption.
Do you have any suggestion on how to proceed? Do you want me to create a PR without the tests or should we fix them first?
The solution was quite simple. I just moved the acquiring of log_func out of the catch_unwind scope. See line 411 here.
You've got to be careful here because if the lock is poisoned then calling unwrap()
will trigger a panic.
Moving log_func
outside of catch_unwind
means a poisoned lock will now unwind into the calling function (written in C). Unwinding across the FFI boundary is UB, so any Rust code which may panic needs to be executed inside a catch_unwind
.
You could remove the unwrap()
and pattern match on the result of lock()
though.
if let Ok(message) = std::str::from_utf8(buffer) {
// we can't log the errors that occur while logging errors, so just
// drop them on the floor...
if let Ok(log_func) = state.log_func.lock() {
let _ = std::panic::catch_unwind(|| {
let log_func = state.log_func.lock().unwrap();
log_func(level, message);
});
}
}
That was the easy part. Testing this turned out to be much more challenging.
How do people normally test that locking is done correctly? I know one method is to verify using logical analysis or special tools like go's race detector, but that's not my area of expertise.
Another question to ask is whether we even need to test it. We could argue that using locking correctly is libsignal-protocol-c
's responsibility.
Adding Send
and Sync
implementations isn't something I'd do lightly, so we may want to get another opinion here. I know you need to manually implement Send
and Sync
for types containing raw pointers because it acts as a speed bump for people writing FFI code, letting them know they need to check the C code being interacted with is also thread-safe.
Another thing to keep in mind is if our Context
is !Sync
we can statically ensure it won't be sent across threads. So all lock()
and unlock()
functions could be safely rewritten as no-ops.
However, it turns out that the decryption code in the underlying C library requires the mutex implementation to be reentrant which parking_lot::RawMutex is not.
How did you initially determine we need a reentrant lock? If we can remove the requirement for reentrant locking these problems should all go away.
I will play the Rust noob card here, but not spotting that I moved an unwrap out from a catch_unwind
is kind of bad 😬. Your suggestion is obvioulsy the correct one. Good catch!
Using the lockless approach sounds like an excelent idea! One thread
, one Context
!
Went ahead and implemented this. I added a comment about the reasoning behind the noop locking strategy inside lock_function
. Should this be documented somewhere else as well?
If we go this path I think the other dissucssion is kind of obsolete, but I still want to address a couple of your questions.
- The remutex is (was) needed as
decrypt_pre_key_message
invokessession_cipher_decrypt_pre_key_signal_message
that in turn invokessession_cipher_decrypt_from_record_and_signal_message
. The two latter ones both invokessignal_lock
(which usesContext::lock_function
). - Regarding testing of mutexes I tried to take inspiration from the tests in parking_lot. The issue I had was having to wrap the lock under test in another lock to be able to transfer it btw threads. This did not end well :)
Note: The noop locking is now part of the decryption branch. Depending on your preferences this could potentially be a separate PR though.
All good, it's always nice to have a second pair of eyes look over your code 😁
I added a comment about the reasoning behind the noop locking strategy inside lock_function. Should this be documented somewhere else as well?
How about using static_assertions::assert_not_impl_all!(Context: Send, Sync)
to assert that our Context
type isn't thread-safe? Then just above it you'll be able to mention why we use no-op locking and link to this conversation. That way we can ensure future changes won't accidentally make the Context
type Send
.
The noop locking is now part of the decryption branch. Depending on your preferences this could potentially be a separate PR though.
If you'd like to land it all in one big PR or break each set of changes up into their own smaller PRs I'm happy either way. I guess smaller PRs might make code review easier though.
Implemented in #57.