Clarification on Lock and Synchronization

Question

Clarification on Lock and Synchronization

kosav7r opened this issue 4 months ago · comments

Hello,

I am writing a small in-memory key-value store. I have a single-processor environment and trying an experiment with the library.

As I only have a single vcpu, according to the documentation, I don't need atomics for synchronization.

“Multiple coroutines in the same OS thread have no visibility issues with each other. For example, if multiple coroutines modify variables inside a thread at the same time, we don't need to use atomic variables, and there is no need to pay attention to memory order.”

However, I am not clear on synchronization. Can't multiple threads(coroutines) cause race conditions on the key-value store assuming they are modifying a non-atomic underlying memory address? What's the best way to achieve concurrency? In this case, locks are preferred rather than atomics?

Best.

Bob Chen · Answer 1 · Sun Mar 24 2024 19:01:24 GMT+0800 (China Standard Time)

Photon will not switch threads unless your code specifically asks to, by calling thread_yield or thread_sleep.
The underlay implementation of lock and synchronization is thread_sleep, as well as network IO and file IO, and other integrated tools.

If you need to protect some thing that might be changed by other threads, and your code might yield CPU by internally calling some sleep, you should use locks.

kosav7r · Answer 2 · Mon Mar 25 2024 00:40:24 GMT+0800 (China Standard Time)

I am using the Coro20.h, could std::suspend_alwaysalso be used to yield?

Could you also give some details about Spinlock? Would it yield as well, or is it a busy wait?

Bob Chen · Answer 3 · Mon Mar 25 2024 01:56:51 GMT+0800 (China Standard Time)

coro20.h is experimental and less performant, you should not use that.

Spinlock does not yield, its code is small, you can read the thread.h source.

Your single vCPU program should probably not use spinlock

kosav7r · Answer 4 · Mon Mar 25 2024 02:04:39 GMT+0800 (China Standard Time)

Is there any plan to improve coro20? That style is programming has been getting very popular recently. Is there any benchmark or some ballpark estimate how less performant it is?

Bob Chen · Answer 5 · Mon Mar 25 2024 02:08:54 GMT+0800 (China Standard Time)

Why the traditional multithreaded programming paradigm is less popular?

kosav7r · Answer 6 · Mon Mar 25 2024 02:13:50 GMT+0800 (China Standard Time)

I never said it’s less popular, I am saying it’s been getting traction. Using co_* calls and task chaining is a lot simpler than the thread API of photon.

Bob Chen · Answer 7 · Mon Mar 25 2024 02:18:12 GMT+0800 (China Standard Time)

There are many open source frameworks on C++20 coroutines, but we failed to find any competitors in terms of performance. Gaps are huge. You can run their echo server example

kosav7r · Answer 8 · Mon Mar 25 2024 02:21:41 GMT+0800 (China Standard Time)

Agree, photon seems like a very well-organized and targeted library. However, I would consider supporting co_* syntax would be just abatraction around what's already achieved in this library. Is there any reasoning to understand better, why coro20 would be less performant?

Bob Chen · Answer 9 · Mon Mar 25 2024 02:24:50 GMT+0800 (China Standard Time)

Agree, photon seems like a very well-organized and targeted library. However, I would consider supporting co_* syntax would be just abatraction around what's already achieved in this library. Is there any reasoning to understand better, why coro20 would be less performant?

@Coldwings Hear what the author says.

Huiba Li · Answer 10 · Mon Mar 25 2024 10:26:55 GMT+0800 (China Standard Time)

@kosav7r Stackless coroutine, including the one provided by C++20 and coro20.h, has fundamental issues in performance when you have multiple levels of invocations, i.e. function A calls function B, which further calls function C, etc. Stackless coroutine incurs a high overhead to the calling chain. We provide coro20.h only because it's getting popular.

Huiba Li · Answer 11 · Mon Mar 25 2024 10:38:48 GMT+0800 (China Standard Time)

Can't multiple threads(coroutines) cause race conditions

Yes, it's still possible, if you cause context switching in a logic block. But it's less likely and more deterministic than kernel threads.

Huiba Li · Answer 12 · Mon Mar 25 2024 10:41:21 GMT+0800 (China Standard Time)

by calling thread_yield or thread_sleep.

directly or indirectly.

Note that, other functions can internally invoke thread_yield or thread_sleep, such as socket send/recv, or file read/write, etc.

Huiba Li · Answer 13 · Mon Mar 25 2024 10:43:47 GMT+0800 (China Standard Time)

give some details about Spinlock

Coroutines in a single vCPU should usually avoid spinlock, because there's no chance for another coroutine to lease the lock in the case of contention. This is effectively a dead lock.

kosav7r · Answer 14 · Mon Mar 25 2024 11:09:31 GMT+0800 (China Standard Time)

@kosav7r Stackless coroutine, including the one provided by C++20 and coro20.h, has fundamental issues in performance when you have multiple levels of invocations, i.e. function A calls function B, which further calls function C, etc. Stackless coroutine incurs a high overhead to the calling chain. We provide coro20.h only because it's getting popular.

Thank you for your response. Sorry If I sound like a broken record. Is there a quantifiable estimate or benchmark? How much less performance is expected? Would it be still faster than libraries like libcoro or seastar. We switched our programming model to coroutines at work and saw a huge performance improvement. However, it's not only that, everybody became so productive because the traditional way of doing things with callbacks eventually becomes a callback hell.

For example; photon::thread is great, lightweight fiber/coroutine but implementing a simple, let's say “read_from_disk” involves thinking about callbacks. Whereas, in coroutine API, it's as simple as

task<content> read(){bunch of co_awaits with co_return}

Simple to read, less code. What do you think?

Huiba Li · Answer 15 · Mon Mar 25 2024 12:12:56 GMT+0800 (China Standard Time)

We have only a micro benchmark by solve the puzzle of Hanoi. The callback-style code is:

void Hanoi(char n, char from, char to, char aux, Callback cb) {
    if (n==0) return;
    Hanoi(n-1, from, aux, to);
    cb(n, from, to); // move the n-th disk
    Hanoi(n-1, aux, to, from);
}

Each callback gives out a single move.

The relative time costs of C++20 stackless coroutine, C++23 stackless std::generator, and Boost.Context (stackful) are as follow (normalized to the callback version):

This benchmarks mimic the real life situation of multiple levels of function (coroutine) invocation, while avoiding the effects of real I/O.

Huiba Li · Answer 16 · Mon Mar 25 2024 12:38:33 GMT+0800 (China Standard Time)

photon::thread ... involves thinking about callbacks

No, usually you don't need to think about callbacks with photon. That's one of the most attractive parts of stackful coroutine.

kosav7r · Answer 17 · Mon Mar 25 2024 13:04:14 GMT+0800 (China Standard Time)

What's the most preferred way to return a value from the photon thread upon completion without blocking?

Would you pass a promise and get a future, or continue with the callback style approach?

Simple example; read from a database. With c++20 style coroutine, that's

coro<content> read(string key)

Huiba Li · Answer 18 · Mon Mar 25 2024 14:16:05 GMT+0800 (China Standard Time)

We usually use shared variable(s) and a semaphore to coordinate foreground and background threads. And I believe the promise/future paradigm is more fit for general cases.

kosav7r · Answer 19 · Thu Mar 28 2024 03:19:31 GMT+0800 (China Standard Time)

Thank you for the PR! That’s a great step.

As someone coming from different async approaches(different languages), mostly following reactive programming style, I find using thread, shared variables, and a semaphore a difficult approach to solving problems.

Unfortunately, C++ didn't have any standardization in the scope, so I see callbacks or shared variables are common patterns in the C++ world.

Writing async pipelines is extremely simplified if we can have async compositions like coroutines or Future. I hope you will consider this a lot more. I am open to discussions.