CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RuntimeError: module must have its parameters and buffers on device

Adaickalavan opened this issue · comments

I attempted to train the reward model without DeepSpeed by executing python3.9 train_reward_model_gptj.py, but this throws the following message. How can I rectify this?

RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

or

RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1

I attempted to train the reward model without DeepSpeed by executing python3.9 train_reward_model_gptj.py, but this throws the following message. How can I rectify this?

RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

or

RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1

Can you please try to use: deepspeed train_reward_model_gptj.py ?
In readme we already mentioned the way to train the reward model.

Hi @PhungVanDuy,

While running deepspeed train_reward_model_gptj.py in a multiple GPU setup, some of the initial steps, such as

  • model = GPTRewardModel("CarperAI/openai_summarize_tldr_sft")
  • train_pairs = create_comparison_dataset(data_path, "train")
  • train_dataset = PairwiseDataset(train_pairs, tokenizer, max_length=max_length)

are being executed multiple times. How do we run the initial loading and preprocessing steps only once and then share them with all the processes?

Sorry for the late response!

Normally for this case, we process datasets like this if you want to process in one process and apply for another process you can check this way.

You will determine the rank of the process and then process at rank 0 then you will broadcast to another rank but need to make sure that the data you broadcast have to be pickleable like (list, tensor, dict, ...).

In this case PairwiseDataset I don't think that is straightforward if you do with this way, instead I would suggest that you process the dataset offline and save to a binary object, and load it when you train.

commented

I think @PhungVanDuy's response was exhaustive on this