Change Control interfaces Thread Safeness for RDM EP's

Question

Change Control interfaces Thread Safeness for RDM EP's

a-szegel opened this issue 6 months ago · comments

The libfabric domain man pages currently make the following statement:

Control interfaces are always considered thread safe, and may be accessed by multiple threads.

I would like to change this statement to be:

Control interfaces are always considered thread safe for MSG EP's, and may be accessed by multiple threads. For RDM EP's, control interfaces follow the threading model of the domain.

This would allow us to have the cq->ep_list_lock, av->ep_list_lock, and cntr->ep_list_lock default to no-op when the user requests FI_THREAD_DOMAIN.

I want this b/c when I commented out cq->ep_list_lock here, and ran allreduce NCCL tests on a p5, I saw the avg bus bandwidth increase by 1.3%.

I understand this changes the API, and has the potential to break applications.

Seth Zegelstein commented 5 months ago

#9725

Shi Jin · Answer 1 · Wed Jan 03 2024 07:40:10 GMT+0800 (China Standard Time)

The issue is not related to EP type.

I prefer we define a new threading level as FI_THREAD_CTRL to refine the access to control interface. @j-xiong do you have any feedback on this?

Jianxin Xiong · Answer 2 · Wed Jan 03 2024 08:51:40 GMT+0800 (China Standard Time)

No, we don't want new threading levels. Instead, part of the 2.0 API effort is to reduce the number of threading levels.

I agree this is unrelated to EP type. It's more about if the application needs the control path to be thread safe. A specific application being able to run correctly w/o the locking doesn't mean other applications can do the same. The default behavior should always take correctness as higher priority over performance.

I would prefer a provider specific option for optimizing control path locks.

Seth Zegelstein · Answer 3 · Wed Jan 03 2024 09:03:03 GMT+0800 (China Standard Time)

Where do you envision the provider specific option for optimizing control path locking living?

Jianxin Xiong · Answer 4 · Wed Jan 03 2024 09:12:44 GMT+0800 (China Standard Time)

For example, via the fi_set_val() API.

Seth Zegelstein · Answer 5 · Wed Jan 03 2024 09:20:37 GMT+0800 (China Standard Time)

From my perspective, FI_THREAD_DOMAIN should mean that only one thread per domain is allowed to make ALL libfabric API calls. Adding a thread safe control API adds unnecessary complexity in order to optimize the performance of connection based msg endpoints. This is related to EP type, b/c this decision was made to optimize a specific EP type.

I would prefer to change the API to say:

"FI_THREAD_DOMAIN means 1 thread per domain at a time", and have apps that were using the previous threading model be forced to claim FI_THREAD_SAFE.

Since it isn't great to break existing customers, we should make this change in version 2.

Adding a breaking change to Libfabric 2.0 is acceptable. This suggested change will make our API simpler to understand, cleaner to implement, and faster to use.

Seth Zegelstein · Answer 6 · Wed Jan 03 2024 13:16:09 GMT+0800 (China Standard Time)

If we want to avoid the performance regression for applications designed around message endpoints (using FI_THREAD_DOMAIN with multiple threads in a domain at a time t1 control, t2 data); then I agree with Shi that a new threading model is the best way to do this.

Users will go to the docs to look up the threading model before going to look at fi_set_val() API options (much less hiden)
Why should we create a new way to control the Application threading model, when we already have the threading model.
If we want to keep FI_THREAD_DOMAIN behavior constant, we can add a new model called FI_THREAD_DOMAIN_RDM without a thread safe control API.

I understand the desire to have less threading models, but I don't understand the advantages to solving this problem not using the threading model.