alibaba / PhotonLibOS

Probably the fastest coroutine lib in the world!

Home Page:https://PhotonLibOS.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Thread-per-core Architecture

kosav7r opened this issue · comments

Hi folks,

I'm in the process of building a storage system and evaluating PhotonLib. Same architectural decisions on the system:

  1. Thread per core, no context switching is desired
  2. Share nothing; each thread will allocate resources including memory, network(sockets). The goal is to eliminate synchronization and improve cache efficiency, this is absolutely very important.
  3. Interrupt Affinity

Do you have any recommendations on thread-to-thread communication without sharing any memory?

Coroutine-based approach is new to me. Can I achieve my basic needs on PhotonLib? If so, are there any examples?

Thanks!

Kindly pinging :)

Yes, Photon coroutines can satisfy your requirements. Every resource is located in a single thread and shared among coroutines. You can read the documents for more details.

Linux thread is referred to as vCPU in Photon. Each of them has a dedicated scheduler for coroutines (Photon's threads), and a dedicated instance of event engine (e.g. epoll or io_uring). Their execution is basically independent of each other, unless you conduct inter-vCPU task coordination or migration.

You can realize interrupt affinity in the same way you do to other applications, e.g. pinning interrupt handler and corresponding photon vCPU (Linux thread) to the same physical CPU core.

What would you recommend to separate CPU and IO-bound jobs in the programming models? As far as I know, coroutines are for IO-bound jobs.

You can use the migrate API to move your CPU bound tasks to specific vCPU. It’s lightweight

What would you recommend for Interprocessor communication if shared memory is absolutely no?

Is it a Photon related issue?

Already appreciate your answers so I apologize if it sounds unrelated. I am evaluating and comparing Photon with Seastar. Trying to map approaches in Seastar to Photon.

For example, In Seastar, it is mostly done by passing a lambda to a neighbor VCPU. I was wondering what do you think is a best approach to take as communication between VCPUs.

A Photon thread (coroutine) is essentially a function. Lambda is also the same.

The underlay implementation of thread migrate is eventfd notification and task queue.

Besides Photon also has a MPMC queue to transmit functions, encapsulated as the so-called WorkPool

I've searched the code, how about use sched_setaffinity(linux)/thread_policy_set(macos) to bind vCPU to a single CPU core? @beef9999

@loongs-zhang Are you suggesting that we bind vCPU by default?

What would you recommend for Interprocessor communication if shared memory is absolutely no?

Multi-process without sharing memory? How about UNIX domain socket?

@loongs-zhang Are you suggesting that we bind vCPU by default?

yes

What would you recommend for Interprocessor communication if shared memory is absolutely no?
Multi-process without sharing memory? How about UNIX domain socket?

How about deep cloning and sharing?

@loongs-zhang Are you suggesting that we bind vCPU by default?

yes

As different apps require different binding configuration, it's difficult for us to do it by default.
For example, a typical scenario is file/storage server. We may need to consider IRQ handlers of the NICs and SSDs, and our service threads (vCPUs). The best binding configuration should minimize CPU switching along the execution.

How about deep cloning and sharing?

I not sure whether cloning is feasible, as it may imply sharing in the first place, and @kosav7r said it was "absolutely no".

What would you recommend to separate CPU and IO-bound jobs in the programming models? As far as I know, coroutines are for IO-bound jobs.

Photon has a built-in WorkPool to deal with various kinds of background jobs. For IO-bound ones, you can initialize the worker vCPUs to enable coroutines and event engines. For CPU-bound ones, you can simple use kernel threads without initializing photon.

BTW, the jobs are efficiently passed to workers with lock-free shared memory ring queue.