About the weight of word embedding being nan
ItGirls opened this issue Β· comments
ItGirls commented
π Describe the bug
In accelerate_base_trainer.py
for _ in range(self.config.train.epochs):
# For each batch
for mbs in MiniBatchIterator(self.train_dataloader, self.mb_size, self.num_mb):
# For each update per batch
for _ in range(self.n_updates_per_batch):
# Note that whereas standard policy gradient methods perform one
# gradient update per batch, PPO for example commonly performs
# multiple gradient updates on the same batch of data.
# https://arxiv.org/pdf/1707.06347.pdf
forward_time = 0
backward_time = 0
stats_accum = []
for mb in mbs:
with self._accumulate():
forward_time -= time()
loss, stats = self.loss(mb)
the first loop, the total loss be 0.11, when continue, I find the loss changed to nan, even if the input(mb) is the same. After debug I find that from the second loop, the weight of word_embedding changed to nan, I don't why.
BTW, I changed some code in trlx to suit chatglm.
Which trlX version are you using?
No response
Additional system and package information
No response