[Question] why the layer of the model has not gradient, the loss deceases from 3rd epoch when running the train_mem.py file
weixiaochen358 opened this issue · comments
Question
why the layer of the model has not gradient, the loss deceases from 3rd epoch when running the train_mem.py file.
from transformers import TrainerCallback
class GradientCheckingCallback(TrainerCallback):
def on_step_end(self, args, state, control, **kwargs):
# This method is called at the end of each training step
model = kwargs['model'] # Access the model
for name, param in model.named_parameters():
if 'mm_projector' in name and param.grad is not None:
print(f"Gradient for {name}: {param.grad.norm().item()}")
elif 'mm_projector' in name:
print(f"No gradient for {name}")
trainer = LLaVATrainer(model=model,
tokenizer=tokenizer,
args=training_args,
callbacks=[GradientCheckingCallback()],
**data_module)
No gradient for model.mm_projector.0.weight
No gradient for model.mm_projector.0.bias
No gradient for model.mm_projector.2.weight
No gradient for model.mm_projector.2.bias
{'loss': 8.1787, 'grad_norm': 126.70487213134766, 'learning_rate': 0.001, 'epoch': 1.0}
20%|█████████ | 1/5 [00:01<00:04, 1.13s/it]No gradient for model.mm_projector.0.weight
No gradient for model.mm_projector.0.bias
No gradient for model.mm_projector.2.weight
No gradient for model.mm_projector.2.bias
{'loss': 8.1787, 'grad_norm': 126.70057678222656, 'learning_rate': 0.00075, 'epoch': 2.0}
40%|██████████████████ | 2/5 [00:01<00:02, 1.06it/s]No gradient for model.mm_projector.0.weight
No gradient for model.mm_projector.0.bias
No gradient for model.mm_projector.2.weight
No gradient for model.mm_projector.2.bias
{'loss': 9.7352, 'grad_norm': 19.390901565551758, 'learning_rate': 0.0005, 'epoch': 3.0}
60%|███████████████████████████ | 3/5 [00:02<00:01, 1.13it/s]No gradient for model.mm_projector.0.weight
No gradient for model.mm_projector.0.bias
No gradient for model.mm_projector.2.weight
No gradient for model.mm_projector.2.bias
{'loss': 6.939, 'grad_norm': 9.54937744140625, 'learning_rate': 0.00025, 'epoch': 4.0}
80%|████████████████████████████████████ | 4/5 [00:03<00:00, 1.17it/s]No gradient for model.mm_projector.0.weight
No gradient for model.mm_projector.0.bias
No gradient for model.mm_projector.2.weight
No gradient for model.mm_projector.2.bias
{'loss': 4.7991, 'grad_norm': 5.036875247955322, 'learning_rate': 0.0, 'epoch': 5.0}
{'train_runtime': 4.3987, 'train_samples_per_second': 2.273, 'train_steps_per_second': 1.137, 'train_loss': 7.56612548828125, 'epoch': 5.0}
100%|█████████████████████████████████████████████| 5/5 [00:04<00:00, 1.14it/s]