CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About the weight of word embedding being nan

ItGirls opened this issue Β· comments

πŸ› Describe the bug

In accelerate_base_trainer.py

 for _ in range(self.config.train.epochs):
            # For each batch
            for mbs in MiniBatchIterator(self.train_dataloader, self.mb_size, self.num_mb):
                # For each update per batch
                for _ in range(self.n_updates_per_batch):
                    # Note that whereas standard policy gradient methods perform one
                    # gradient update per batch, PPO for example commonly performs
                    # multiple gradient updates on the same batch of data.
                    # https://arxiv.org/pdf/1707.06347.pdf
                    forward_time = 0
                    backward_time = 0
                    stats_accum = []
                    for mb in mbs:
                        with self._accumulate():
                            forward_time -= time()
                            loss, stats = self.loss(mb)

the first loop, the total loss be 0.11, when continue, I find the loss changed to nan, even if the input(mb) is the same. After debug I find that from the second loop, the weight of word_embedding changed to nan, I don't why.
image
image

image

BTW, I changed some code in trlx to suit chatglm.

Which trlX version are you using?

No response

Additional system and package information

No response

commented

Hi @ItGirls, can you share your training script, and also make your fork public to see the changes that were made to the code?