Wrong vector addition result inside a Cuda kernel.
nikitanodar opened this issue · comments
The current master version (a2844eed
) produces the wrong result in the kernel:
#include <glm/vec4.hpp>
#include <stdio.h>
__global__ void foo()
{
glm::vec4 a{1.0f, 1.0f, 1.0f, 1.0f};
glm::vec4 b{2.0f, 2.0f, 2.0f, 2.0f};
glm::vec4 c = a + b;
printf("%f %f %f %f\n%f %f %f %f\n%f %f %f %f\n%f %f %f %f\n\n", //
a.x, a.y, a.z, a.w, //
b.x, b.y, b.z, b.w, //
a.x + b.x, a.y + b.y, a.z + b.z, a.w + b.w, //
c.x, c.y, c.z, c.w);
}
int main()
{
foo<<<1, 1>>>();
cudaDeviceSynchronize();
return 0;
}
The result is:
1.000000 1.000000 1.000000 1.000000
2.000000 2.000000 2.000000 2.000000
3.000000 3.000000 3.000000 3.000000
1.000000 1.000000 1.000000 0.000000
The tag 1.0.1 version works as expected, i.e. the output is:
1.000000 1.000000 1.000000 1.000000
2.000000 2.000000 2.000000 2.000000
3.000000 3.000000 3.000000 3.000000
3.000000 3.000000 3.000000 3.000000
OS | Ubuntu 20.04 |
nvcc --version | Cuda compilation tools, release 11.4, V11.4.315 |
nvidia-smi | Driver Version: 535.171.04 CUDA Version: 12.2 |
The same issue is reproduced on my colleagues machines with different OSes/drivers.
I found the same problem