Signal time is impacted by the number of disconnected slots

Question

Signal time is impacted by the number of disconnected slots

NicolasLacombe opened this issue 5 years ago · comments

Hi,

A side effect of this commit, which introduce cleaning up disconnected slot during a signal, is that the performance of signal is now impacted by the number of disconnected slot.

Not sure if that can be considered a bug, or just a design decision, but IMHO it would feel more natural to have a constant signal time, for a longer disconnection time. The side effect would probably be a heavier memory impact of connections though, and a of course a slightly longer disconnection.

I'v created a unit test "signal-performance" in my fork to underline the issue. Might get a shot at proposing an alternative implementation, but I wanted to get your thoughts on this first.

Thanks a lot,

Nicolas.

Pierre-Antoine Lacaze · Answer 1 · Fri Apr 12 2019 06:22:13 GMT+0800 (China Standard Time)

This particular commit does not change the cleanup with respect to disconnected signals, Sigslot has been designed that way from the start.

My observation was that in my code base, slots are connected right after signal creation, and disconnected at signal destruction most of the time, which means I hardly ever need to disconnect slots right in the middle of the execution.

Note that I am not a software developer by trade, so I may not have envisioned many use cases that would justify immediate slot cleanup upon disconnection to make emission as fast as possible. Thread safety and synchronous execution of slots were, for me, the most important features.

Now I will be glad to merge such a feature if it does not introduce any side effect, which may prove out to be a challenge to implement, because the way I track slot states with a lightweight object and atomic booleans is basically the opposite of that would be required.

Lacombe Nicolas · Answer 2 · Fri Apr 12 2019 20:08:03 GMT+0800 (China Standard Time)

Hi,

thanks for the answers. But I still think that before this commit, disconnected slots were never cleaned-up. That's because it was a linked list that would self-destroyed at the destruction of the signal only. But that was problematic, and could led to very big callstack. Replacing the linked list with a vector, and adding an erase during signal, introduced the cleaning up of disconnected slot. I might be wrong but that's what I saw while debugging, prior to this patch.

Anyway, back to the original issue, there would be side effects indeed to such a change. The side effects would be a bigger memory print of the connection, and a slightly longer disconnection.

Another idea could be to offer a cleanup function in signal. Thay way one could choose to cleanup the disconnected slot on demand. What would you think of such a feature?

Best,

Nicolas

Pierre-Antoine Lacaze · Answer 3 · Sat Apr 13 2019 17:55:42 GMT+0800 (China Standard Time)

I considered adding a cleanup method but I worry users will find it more confusing than useful. I am trying to avoid leaking implementation details in the public interface.

Lacombe Nicolas · Answer 4 · Sat Apr 13 2019 22:37:51 GMT+0800 (China Standard Time)

Understandable.

I'll probably give a shot at "fixing" this, or rather, changing the behavior, when I'll get the time. I really like this library and think it's very useful, but the fact that the signal performance is impacted by the number of disconnected slots seems unnatural to me.

So I guess you can close this issue if you consider this is not really a bug, but rather a design choice. Addressing this in the documentation might be useful for library users though.

When I'll get the chance to try an alternative implementation I'll submit a review.

Pierre-Antoine Lacaze · Answer 5 · Wed Apr 17 2019 05:49:49 GMT+0800 (China Standard Time)

So, I actually had a go at it.
I made experimental changes in the immediate-disconnection branch. The main idea is to keep a reference in slots to the owning signal, and call a cleanup function whenever a slot gets disconnected.

This also makes signal emission slightly faster in the normal case because cleanup is not necessary anymore, which is a good thing.

I am not really happy with the implementation which passes Lockable types around everywhere.

I have not taken the time to reason about the thread safety issues this change could bring, but I think this should be fine, at the very least the unit tests pass.

Pierre-Antoine Lacaze · Answer 6 · Wed Apr 17 2019 14:38:32 GMT+0800 (China Standard Time)

And I just pushed a less intrusive implementation.

Lacombe Nicolas · Answer 7 · Wed Apr 17 2019 15:03:31 GMT+0800 (China Standard Time)

Hi,

Thanks for giving a go at this, and adding signal-performance test & lambdas example.

I'v reviewed the code and it looks coherent and a nice addition. The only things I'm questioning is: shouldn't the signal_base::clean function be much more simple?
The current implementation is generic and should be able to cleanup any numbers of disconnecting slots, by parsing & reordering the whole list. But now that every disconnection leads to a cleanup, shouldn't we always have one and only one slot to disconnect? I'v tried hacking an assertion that verify that, and none of the current tests triggered the assertion.

If that's the case, you could just stop at the first disconnected signal found. I would also argue than parsing the containers is not necessary, because it's the slots himself who get disconnected and trigger the cleanup, so he could actually give a hint to the signal as to who needs to be cleaned up (himself). This could help the signal erasing this particular slot 'directly'.

Best,

Nicolas

Pierre-Antoine Lacaze · Answer 8 · Wed Apr 17 2019 15:28:20 GMT+0800 (China Standard Time)

Technically speaking, more than one slot can get disconnected prior to the call to the cleanup function. However you are right in that function will be called once per disconnection and traversing the whole list of slots every time is a pessimization.

Pierre-Antoine Lacaze · Answer 9 · Thu Apr 25 2019 05:08:04 GMT+0800 (China Standard Time)

I actually looked at it some more and had some sort of epiphany yesterday whilst wondering how signal emission could be made faster in the face of multi-threaded execution.

The main culprit right now is the unconditional slots list copy that must happen to ensure signal emission out of the lock protecting access to the signal private's data. Emission under the lock won't do — and I actually made this mistake previously — because it will lead to deadlocks in some situations. The proverbial example is two signals modifying each other inside two slots at the same time. I have a unit test for that. Anyway, copies need to happen, but are those always necessary? It seems that if the signal data is not being modified by new connections or disconnections, the list of slots is not being modified, only read.

So a copy is only needed when a write happens concurrently with another read or write. Well, I know this pattern, this is copy on write (COW)! Moreover, most of the time, signals are first configured and connected at the beginning of a program, and then a stream of signals is being emitted with no other modification happening, so COW should be efficient.

I just pushed a simple copy-on-write implementation that does just this, and I must say the gains are nothing short of impressive. One should always be wary of micro benchmarks, but as far as things go, the signal-performance test case you provided just saw its run time shrink from approximately 35-40 µs to only 5 µs on my box.

This work lies in the immediate-disconnection branch, as I would like to take the time to ponder the implications of all the recent changes on thread safety. Unit tests are only as trustworthy as the one who wrote them :) I know I might have made a few oversights.

Thank you for enticing me to work on this. Performance was not the main design goal of this library but last year's rework to improve correctness bugged me a little because I knew it had a huge impact on performance.

Lacombe Nicolas · Answer 10 · Thu Apr 25 2019 15:47:41 GMT+0800 (China Standard Time)

Hi,

Great! Thank you for for looking into it and for the details explanation. Learned a few things :)

I agree with you that most of the time, signals are first configured and connected at the beginning of a program. So improving this scenario using COW seems like a very good idea!

I'll try to dig a little bit deeper into the new code when I get the time!

Pierre-Antoine Lacaze · Answer 11 · Sun Apr 28 2019 03:25:13 GMT+0800 (China Standard Time)

I have gone ahead an published a new release with all those changes.
I consider the issue resolved so I will close it.