multiple .cuda() usage might cause inconsistent device uses

Question

multiple .cuda() usage might cause inconsistent device uses

rhythmswing opened this issue 5 years ago · comments

Hi,

I've noticed that in various .py files under /models/, such as last_graph.py and attention.py, in some modules, a new tensor (mostly masks) is created and is called .cuda(). Would it cause any device inconsistency issue?

For example, I might want to specify a non-default gpu device or even cpu in the input argument.
In the 246th line of attention.py, attention might be in cuda:1, while torch.sqrt(self._key_dim) is in cuda:0, raising an error.

Would it be better to use .to(attention.get_device()), if attention.get_device() > -1 (when indeed gpu is used)?

Rik Koncel-Kedziorski · Answer 1 · Mon Nov 04 2019 02:01:43 GMT+0800 (China Standard Time)

This is a good suggestion, thanks! I will update this soon.

rhythmswing · Answer 2 · Mon Nov 04 2019 02:02:59 GMT+0800 (China Standard Time)

This is a good suggestion, thanks! I will update this soon.

Actually I've fixed it in my local code, mind if I help?

Rik Koncel-Kedziorski · Answer 3 · Mon Nov 04 2019 02:05:24 GMT+0800 (China Standard Time)

Can you make a pull request and I will review it

rhythmswing · Answer 4 · Mon Nov 04 2019 02:06:01 GMT+0800 (China Standard Time)

Let me try that, thanks

Rik Koncel-Kedziorski · Answer 5 · Wed Nov 06 2019 08:24:44 GMT+0800 (China Standard Time)

solved by #10