RuntimeError: module must have its parameters and buffers on device

Question

RuntimeError: module must have its parameters and buffers on device

Adaickalavan opened this issue 10 months ago · comments

I attempted to train the reward model without DeepSpeed by executing python3.9 train_reward_model_gptj.py, but this throws the following message. How can I rectify this?

RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

or

RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1

Duy V. Phung · Answer 1 · Wed Aug 02 2023 22:55:42 GMT+0800 (China Standard Time)

I attempted to train the reward model without DeepSpeed by executing python3.9 train_reward_model_gptj.py, but this throws the following message. How can I rectify this?
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu
or
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1

Can you please try to use: deepspeed train_reward_model_gptj.py ?
In readme we already mentioned the way to train the reward model.

adai · Answer 2 · Thu Aug 03 2023 21:54:57 GMT+0800 (China Standard Time)

Hi @PhungVanDuy,

While running deepspeed train_reward_model_gptj.py in a multiple GPU setup, some of the initial steps, such as

model = GPTRewardModel("CarperAI/openai_summarize_tldr_sft")
train_pairs = create_comparison_dataset(data_path, "train")
train_dataset = PairwiseDataset(train_pairs, tokenizer, max_length=max_length)

are being executed multiple times. How do we run the initial loading and preprocessing steps only once and then share them with all the processes?

Duy V. Phung · Answer 3 · Sat Aug 12 2023 11:08:14 GMT+0800 (China Standard Time)

Sorry for the late response!

Normally for this case, we process datasets like this if you want to process in one process and apply for another process you can check this way.

You will determine the rank of the process and then process at rank 0 then you will broadcast to another rank but need to make sure that the data you broadcast have to be pickleable like (list, tensor, dict, ...).

In this case PairwiseDataset I don't think that is straightforward if you do with this way, instead I would suggest that you process the dataset offline and save to a binary object, and load it when you train.

Max · Answer 4 · Fri Sep 01 2023 18:58:55 GMT+0800 (China Standard Time)

I think @PhungVanDuy's response was exhaustive on this