Thread-safety of disconnecting slots

Question

Thread-safety of disconnecting slots

saierd opened this issue 3 years ago · comments

Take the following example code:

signal<void> signal;

struct Observer {
    Observer(auto& signal) {
        connection = signal.connect([this]() {
            // Use state...
        });
    }

    int state;
    scoped_connection connection;
}

auto observer = std::make_unique<Observer>(signal);

// Thread 1:
signal(); // (1)

// Thread 2:
observer.reset(); // (2)

I thought this should be thread-safe since Observer is built in such a way that the connection goes out of scope before the shared state (observer instance and its state member) of the slot gets destroyed.

It is not, though, since (1) creates a copy of the list of slots at the beginning of the signal emission. The disconnect triggered by (2) then happens asynchronously and the callback can potentially be executed after the observer got destroyed.

Is this behavior on purpose? I know that signal::operator() documents that thread-safety only applies to the signal itself, but it seems like this makes it very hard to manage the lifetime of the observer. The lifetime of the shared state cannot be bound to the lifetime of the scoped_connection, but must be coupled to the lifetime of the lambda.
If it is on purpose I think it would make sense to provide an option that removes the copy from signal::operator() and keeps the mutex locked while the slots are running. I don't care about the additional locking in my application, but this would make managing lifetimes of the callbacks a lot easier.
If I am not mistaken, this also applies to the sigslot::observer class provided by this library. The example for that class uses global state, so it doesn't have this problem. But I think this should be documented since using the observer class like in my example seems very natural.

Pierre-Antoine Lacaze · Answer 1 · Mon Aug 09 2021 01:43:02 GMT+0800 (China Standard Time)

Thank you for the detailed analysis, I agree with your assessment that using scoped_connections in multi-threaded context requires careful planning, and that the current documentation does not offer enough guidance and/or warnings.

I actually invoked the slots under lock in the past, but it caused thread-safety concerns, more precisely it made deadlocks possible.

One way of mitigating the problem may be to copy a list of weak references to the actual slot objects, but it does not fix it entirely. In your use case, Thread 1 may start invocation of the lambda while reset() is being called from Thread 2.

The best way to protect against such behavior is to rely on slots tracking feature offered by the library. Here is a worked example that shows how to do it:

#include <atomic>
#include <iostream>
#include <memory>
#include <thread>
#include <sigslot/signal.hpp>

void thread1(sigslot::signal<> &sig) {
    const auto deadline = std::chrono::steady_clock::now()
                          + std::chrono::milliseconds(200);

    while (std::chrono::steady_clock::now() < deadline) {
        for (int i = 0; i < 1000; ++i) {
            sig();
        }
    }
}

struct Observer {
    static auto create(sigslot::signal<> &signal) {
        auto obs = std::make_shared<Observer>();
        signal.connect(&Observer::slot, obs);
        return obs;
    }

    std::atomic<int> state{};

private:
    void slot() {
        state++;
    }
};

int main() {
    sigslot::signal<> sig;
    auto observer = Observer::create(sig);

    std::thread t(thread1, std::ref(sig));

    std::this_thread::sleep_for(std::chrono::milliseconds(100));
    int v = observer->state;
    observer.reset();

    std::cout << "Got " << v << std::endl;
    t.join();
}

The main selling point of this approach is that the lifetime of the Observer gets extended via a weak_ptr locking while the slot is being invoked. This has the obvious advantage to avoid heavy locking. Would that be an acceptable solution to you?

The design and thread safety of sigslot::observer has actually been discussed in bug #19, and should be fine, provided the derived class calls disconnect_all() in its destructor. I will admit that I do not use observer inheritance and have not tested this feature much though.

Daniel Saier · Answer 2 · Mon Aug 09 2021 17:38:24 GMT+0800 (China Standard Time)

Thank you for your reply.

The shared_ptr approach is a valid solution for the problem. We did think about using it, but decided against it, since it requires storing the "observer" in a shared_ptr even if that is otherwise not necessary. Keeping the lock seemed like an easier solution that solves this without requiring changes to the usage of the signals.

Could you elaborate about the possible deadlocks when calling the slots under the lock? We are currently running sigslot with the following patch:

@@ -686,10 +686,8 @@ public:
         if (m_block)
             return;
 
+        lock_type lock(m_mutex);
+
         // copy slots to execute them out of the lock
         cow_copy_type<list_type, Lockable> copy = slots_copy();
 
@@ -923,7 +921,7 @@ using signal_st = signal_base<detail::null_mutex, T...>;
  * Recursive signal emission and emission cycles are supported too.
  */
 template <typename... T>
-using signal = signal_base<std::mutex, T...>;
+using signal = signal_base<std::recursive_mutex, T...>;
 
 } // namespace sigslot

This fixed the race condition for us and works as expected so far.

Regarding sigslot::observer I am quite sure that using it like in my first example is not safe. Automating the slot destruction basically works the same way as the scoped_connection member in my example Observer class. Destroying them explicitly in the destructor does not change anything as long as the connection is the last member in the class and there is no inheritance involved.

Pierre-Antoine Lacaze · Answer 3 · Fri Aug 13 2021 06:27:43 GMT+0800 (China Standard Time)

Sure, your patch triggers a deadlock right away in the test case dedicated to detect this problem.

Do you have any hard constraints on memory or CPU usage for avoiding shared_ptr? It is not super cheap, but it could also be argued that both keeping the lock longer and using a recursive mutex offset the cost of the shared_ptr.

I will look for a way to improve lifetime management.

Daniel Saier · Answer 4 · Fri Aug 13 2021 22:12:34 GMT+0800 (China Standard Time)

Thank you, I did not think about such complicated situations for emitting signals. In our use case, the signals are always emitted from the same thread, so this cannot happen.

We don't want to avoid shared_ptr for its runtime cost. We want to avoid having to design everything so that it can receive signals. We have existing classes which are not stored in shared_ptr. It would be very convenient if those could receive signals simply by adding a scoped_connection.

Pierre-Antoine Lacaze · Answer 5 · Fri Aug 13 2021 23:34:09 GMT+0800 (China Standard Time)

Yes, thread-safe signal emission is trickier than it appears.

One idea I can think of would be to add a lightweight spin mutex per slot and lock it before slot invocation and slot destruction. That would be cheap most of the time because contention would be quite small.

With a normal spin mutex, this would also make slot callable thread-safe as a side effect, otherwise maybe a RW spin mutex could be designed to allow concurrent invocation of slots.

Pierre-Antoine Lacaze · Answer 6 · Mon Aug 16 2021 06:17:40 GMT+0800 (China Standard Time)

So, I have come up with a preliminary design for this that gives strong safety guarantees for connections and lambdas in multi-threaded contexts. It should basically allow you write the code as you expected it.

I pushed it on the safe-observer branch.

Would you mind testing it to see if it solves your problems?

Pierre-Antoine Lacaze · Answer 7 · Tue Aug 31 2021 07:00:13 GMT+0800 (China Standard Time)

My first implementation was not satisfactory, I pushed a second proof of concept that should work as expected. I added a couple of tests to check the behaviour.

Daniel Saier · Answer 8 · Tue Feb 08 2022 18:37:38 GMT+0800 (China Standard Time)

Thank you for creating the branch and sorry for the delay. We were busy with other things, but tested your changes now.

There seems to be a problem with the test_threaded_disconnection test on the branch. It fails quite often. I am not sure if this is a problem with the implementation or the test.

The implementation actually seems to work for our problem, though. We also added a sigslot test with our exact situation and it passes on the branch (and does not pass on master):

static void test_disconnect_while_slot_is_running() {
    sigslot::signal<> sig;

    int shared_state = 1;

    auto slot = [&shared_state]() {
        shared_state = 1;
        std::this_thread::sleep_for(std::chrono::milliseconds(200));
        assert(shared_state != 0);
    };

    sigslot::scoped_connection conn = sig.connect(slot);

    auto emit = [&sig]() {
        sig();
    };

    std::thread emitter(emit);

    std::this_thread::sleep_for(std::chrono::milliseconds(20));

    assert(shared_state != 0);

    conn.disconnect();
    shared_state = 0;

    emitter.join();
}

Oscar Hellström · Answer 9 · Tue Sep 20 2022 17:14:23 GMT+0800 (China Standard Time)

The design and thread safety of sigslot::observer has actually been discussed in bug #19, and should be fine, provided the derived class calls disconnect_all() in its destructor. I will admit that I do not use observer inheritance and have not tested this feature much though.

I think that the issue exists also in sigslot::observer. At least I can't find that the observer lock is locked before the slot is executed, meaning they can go out of scope while executing. I think this (Googletest) test case proves it:

#include <future>
#include <atomic>

#include <gtest/gtest.h>
#include <sigslot/signal.hpp>

namespace {

class SigslotObserver : public sigslot::observer {
private:
    std::promise<void> _called_prom;
    std::atomic<bool> _called{false};
    std::atomic<bool> &_executing;

public:
    SigslotObserver(std::atomic<bool> &executing) :
        _executing(executing)
    {}

    ~SigslotObserver() {
        disconnect_all();
    }

    void slot(const int &msg) {
        using namespace std::literals::chrono_literals;
        _executing = true;
        if (!_called) {
            _called_prom.set_value();
            _called = true;
        }
        std::this_thread::sleep_for(2ms);
        _executing = false;
    }

    std::future<void> fut() {
        return _called_prom.get_future();
    }
};

}

TEST(SigslotObserver, SlotNotCalledAfterObserverIsDestroyed) {
    using namespace std::literals::chrono_literals;

    sigslot::signal<int> signal;
    std::atomic<bool> keep_signalling = true;
    std::thread feeder{[&](){
            while (keep_signalling) {
                signal(1);
            }
            // make a last call to the router
            signal(1);
    }};

    std::atomic<bool> executing = false;
    {
        SigslotObserver obs{executing};
        signal.connect(&SigslotObserver::slot, &obs);
        EXPECT_EQ(obs.fut().wait_for(100ms), std::future_status::ready);
    }
    // here, the obeserver is out of scope, and shall be destroyed and disconnected
    // if the slot is still executing, this will have bad consequences, since it could
    // access member variables which are now out of scope
    EXPECT_EQ(executing, false);
    // we can now stop signaling and join the background thread
    keep_signalling = false;
    feeder.join();
}

Oscar Hellström · Answer 10 · Tue Sep 20 2022 17:22:21 GMT+0800 (China Standard Time)

The test above does pass on the safe-observer branch.

Oscar Hellström · Answer 11 · Tue Sep 20 2022 20:24:06 GMT+0800 (China Standard Time)

Are there any plans of merging this branch at some point in the future?

Pierre-Antoine Lacaze · Answer 12 · Wed Sep 21 2022 03:56:48 GMT+0800 (China Standard Time)

Are there any plans of merging this branch at some point in the future?

Yes I consider this feature to be essential for the usability of the library. I am however dissatisfied with the current implementation.

Deferred destruction is a kind of garbage collection, which is not easy to get both right and cheap. This attempt relies on atomic ref-counting and spin waiting on disconnection, which might be disastrous for some use cases and also incurs a non negligible cost on every slot invocation.

I would like to implement a better mechanism to mitigate the cost of tracking usages of a slot and read about deferred ref-counting and other strategies, but those are non trivial to implement and have not found the time to get to it yet.

Oscar Hellström · Answer 13 · Wed Sep 21 2022 14:06:05 GMT+0800 (China Standard Time)

Interesting. I did discuss the while(counter > 0) with some of my colleges yesterday.

I assume that a solution with a (recursive) mutex and (mayby a) condition variable would impose too much of a performance penalty, if you consider the atomic reference counting also too expensive.

Pierre-Antoine Lacaze · Answer 14 · Thu Sep 22 2022 04:54:08 GMT+0800 (China Standard Time)

Yes mutexes are not an option. I have a few ideas on how this could be solved, I will try to get to it soonish.

Oscar Hellström · Answer 15 · Thu Oct 05 2023 16:10:44 GMT+0800 (China Standard Time)

Hi, any updates on this?

Pierre-Antoine Lacaze · Answer 16 · Sun Oct 22 2023 00:14:26 GMT+0800 (China Standard Time)

Sorry for the delay I have been busy on other things. I absolutely want to fix this, but this not as easy as it looks. The proof of concept of the safe-observer branch is not good enough and fails on some corner cases.

The way I tried to fix this is basically equivalent to implementing a poor man's rc-based garbage collector, which is completely out of the scope of the project. I will look at from some other angle.