flame / blis

BLAS-like Library Instantiation Software Framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Avoiding unwanted comm and barriers in "L3 sup" path when packing is disabled gave great performance boost

BhaskarNallani opened this issue · comments

Hi @fgvanzee, @devinamatthews

We observed huge performance improvements in "gemmsup" path when we avoid creating thread comm and barriers when there is "No Packing" enabled in sup.

There is no need to sync and communicate for threads when there is no packing enabled in l3 api's and there is no need to to have communicator and barriers between them. We can simply avoid creating communicator in bli_thrinfo_sup_create_for_cntl() function along with barriers.

This simple change gave a huge performance uplift for L3 sup api's like gemmsup and improved overall blis performance at highly competitive.

Kindly consider updating the code accordingly and we can discuss further if there are any side effects to it.

@BhaskarNallani @MithunMohanKadavil @AOCL-Team

Thanks for reminding me of this, Bhaskar. I'll add it to my list.

The changes looks fine for me. Thanks.

Thanks Field for the fix. We can update the sup xgemm graphs later.