NVIDIA / libcudacxx

[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl

Home Page:https://nvidia.github.io/libcudacxx

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[QUESTION] Can we remove the `isnan` check in the `complex<T>` multiplication operator?

jrhemstad opened this issue · comments

The operator*(cuda::std::complex<T>, cuda::std::complex<T>) does an isnan check on the resulting real and imaginary parts of the operation before returning the resulting complex<T>.

Is this required by the C++ Standard?

At first glance, it would seem this is not required:

If the result of a function is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.

https://eel.is/c++draft/complex.numbers#general-3

If it is not required, can we remove these checks?

@griwes @wmaxey @gonzalobg @brycelelbach

Can any of you shine some light on this?

This implementation appears to be inherited from the C99 Standard Annex G

latest-screenshot

Given that the C++ Standard does not implicitly inherit everything from the C Standard and https://eel.is/c++draft/complex.numbers#general-3 makes no mention of any requirement about anything like this, I am convinced that we can remove this from our implementation and remain conforming to the C++ Standard.

Ran a quick experiment where I modified 11.5 libcu++. Modifications were to fix double cast and comment out isnan checks. This data indicates the isnan checks are still very expensive on performance. 65us vs 110us.

The benchmark is found here: https://github.com/NVIDIA/MatX/blob/main/bench/00_transform/conv.cu

A100 isnan checks, double fix

T Samples CPU Time Noise GPU Time Noise Batch GPU Batch
cuda::std::__4::complex 31079x 120.867 us 1.40% 112.339 us 0.53% 109.970 us 31080x

A100 NO isnan checks, double fix

T Samples CPU Time Noise GPU Time Noise Batch GPU Batch
cuda::std::__4::complex 32944x 76.357 us 1.45% 67.539 us 1.59% 64.615 us 32945x

I am wary of changing the current implementation because of non subtle runtime differences it would introduce. Those would only happen in a denormal case but still could potentially silently break user code.

Given that the chance of breaking something is rather slim and users might prefer the performance I would propose to add a feature macro that removes the sanitation of denormals.

Something in the direction LIBCUDACXX_ENABLE_SIMPLIFIED_COMPLEX_OPERATIONS?

Obviously I would like to tackle this in #338

#338 because editing a comment does not link the issue / PR