alibaba / PhotonLibOS

Probably the fastest coroutine lib in the world!

Home Page:https://PhotonLibOS.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clarification on Lock and Synchronization

kosav7r opened this issue · comments

Hello,

I am writing a small in-memory key-value store. I have a single-processor environment and trying an experiment with the library.

As I only have a single vcpu, according to the documentation, I don't need atomics for synchronization.

“Multiple coroutines in the same OS thread have no visibility issues with each other. For example, if multiple coroutines modify variables inside a thread at the same time, we don't need to use atomic variables, and there is no need to pay attention to memory order.”

However, I am not clear on synchronization. Can't multiple threads(coroutines) cause race conditions on the key-value store assuming they are modifying a non-atomic underlying memory address? What's the best way to achieve concurrency? In this case, locks are preferred rather than atomics?

Best.

Photon will not switch threads unless your code specifically asks to, by calling thread_yield or thread_sleep.
The underlay implementation of lock and synchronization is thread_sleep, as well as network IO and file IO, and other integrated tools.

If you need to protect some thing that might be changed by other threads, and your code might yield CPU by internally calling some sleep, you should use locks.

I am using the Coro20.h, could std::suspend_alwaysalso be used to yield?

Could you also give some details about Spinlock? Would it yield as well, or is it a busy wait?

coro20.h is experimental and less performant, you should not use that.

Spinlock does not yield, its code is small, you can read the thread.h source.

Your single vCPU program should probably not use spinlock

Is there any plan to improve coro20? That style is programming has been getting very popular recently. Is there any benchmark or some ballpark estimate how less performant it is?

Why the traditional multithreaded programming paradigm is less popular?

I never said it’s less popular, I am saying it’s been getting traction. Using co_* calls and task chaining is a lot simpler than the thread API of photon.

There are many open source frameworks on C++20 coroutines, but we failed to find any competitors in terms of performance. Gaps are huge. You can run their echo server example

Agree, photon seems like a very well-organized and targeted library. However, I would consider supporting co_* syntax would be just abatraction around what's already achieved in this library. Is there any reasoning to understand better, why coro20 would be less performant?

Agree, photon seems like a very well-organized and targeted library. However, I would consider supporting co_* syntax would be just abatraction around what's already achieved in this library. Is there any reasoning to understand better, why coro20 would be less performant?

@Coldwings Hear what the author says.

@kosav7r Stackless coroutine, including the one provided by C++20 and coro20.h, has fundamental issues in performance when you have multiple levels of invocations, i.e. function A calls function B, which further calls function C, etc. Stackless coroutine incurs a high overhead to the calling chain. We provide coro20.h only because it's getting popular.

Can't multiple threads(coroutines) cause race conditions

Yes, it's still possible, if you cause context switching in a logic block. But it's less likely and more deterministic than kernel threads.

by calling thread_yield or thread_sleep.

directly or indirectly.

Note that, other functions can internally invoke thread_yield or thread_sleep, such as socket send/recv, or file read/write, etc.

give some details about Spinlock

Coroutines in a single vCPU should usually avoid spinlock, because there's no chance for another coroutine to lease the lock in the case of contention. This is effectively a dead lock.

@kosav7r Stackless coroutine, including the one provided by C++20 and coro20.h, has fundamental issues in performance when you have multiple levels of invocations, i.e. function A calls function B, which further calls function C, etc. Stackless coroutine incurs a high overhead to the calling chain. We provide coro20.h only because it's getting popular.

Thank you for your response. Sorry If I sound like a broken record. Is there a quantifiable estimate or benchmark? How much less performance is expected? Would it be still faster than libraries like libcoro or seastar. We switched our programming model to coroutines at work and saw a huge performance improvement. However, it's not only that, everybody became so productive because the traditional way of doing things with callbacks eventually becomes a callback hell.

For example; photon::thread is great, lightweight fiber/coroutine but implementing a simple, let's say “read_from_disk” involves thinking about callbacks. Whereas, in coroutine API, it's as simple as

task<content> read(){bunch of co_awaits with co_return}

Simple to read, less code. What do you think?

We have only a micro benchmark by solve the puzzle of Hanoi. The callback-style code is:

void Hanoi(char n, char from, char to, char aux, Callback cb) {
    if (n==0) return;
    Hanoi(n-1, from, aux, to);
    cb(n, from, to); // move the n-th disk
    Hanoi(n-1, aux, to, from);
}

Each callback gives out a single move.

The relative time costs of C++20 stackless coroutine, C++23 stackless std::generator, and Boost.Context (stackful) are as follow (normalized to the callback version):
image

This benchmarks mimic the real life situation of multiple levels of function (coroutine) invocation, while avoiding the effects of real I/O.

photon::thread ... involves thinking about callbacks

No, usually you don't need to think about callbacks with photon. That's one of the most attractive parts of stackful coroutine.

What's the most preferred way to return a value from the photon thread upon completion without blocking?

Would you pass a promise and get a future, or continue with the callback style approach?

Simple example; read from a database. With c++20 style coroutine, that's

coro<content> read(string key)

We usually use shared variable(s) and a semaphore to coordinate foreground and background threads. And I believe the promise/future paradigm is more fit for general cases.

Thank you for the PR! That’s a great step.

As someone coming from different async approaches(different languages), mostly following reactive programming style, I find using thread, shared variables, and a semaphore a difficult approach to solving problems.

Unfortunately, C++ didn't have any standardization in the scope, so I see callbacks or shared variables are common patterns in the C++ world.

Writing async pipelines is extremely simplified if we can have async compositions like coroutines or Future. I hope you will consider this a lot more. I am open to discussions.