Why is GPU implementation significantly slower than CPU?

Question

Why is GPU implementation significantly slower than CPU?

jinshanmu opened this issue 3 months ago · comments

I was trying the GPU example script:

from devito import *
import matplotlib.pyplot as plt

nx, ny = 100, 100
grid = Grid(shape=(nx, ny))

u = TimeFunction(name='u', grid=grid, space_order=2, save=200)
c = Constant(name='c')

eqn = Eq(u.dt, c * u.laplace)

step = Eq(u.forward, solve(eqn, u.forward))

xx, yy = np.meshgrid(np.linspace(0., 1., nx, dtype=np.float32),
                     np.linspace(0., 1., ny, dtype=np.float32))
r = (xx - .5) ** 2. + (yy - .5) ** 2.
u.data[0, np.logical_and(.05 <= r, r <= .1)] = 1.

op = Operator([step])

stats = op.apply(dt=5e-05, c=.5)

plt.rcParams['figure.figsize'] = (20, 20)
for i in range(1, 6):
    plt.subplot(1, 6, i)
    plt.imshow(u.data[(i - 1) * 40])
plt.show()

The CPU version op = Operator([step]) returned

Operator Kernel ran in 0.01 s

However, the GPU version op = Operator([step], platform='nvidiaX', opt=('advanced', {'gpu-fit': u})) returned

Operator Kernel ran in 4.74 s

My CPU is Intel Xeon Gold 6133 * 80. My GPU is NVIDIA GeForce RTX 4080, with cuda 11.8 and NVIDIA HPC SDK 22.11, which works normally for other programs (e.g. PyTorch).

Any idea on what is going on here?

Thank you in advance!

jinshanmu · Answer 1 · Tue Jul 23 2024 09:08:24 GMT+0800 (China Standard Time)

Turned to the Discussion section of devitocodes.