[Question] why the layer of the model has not gradient, the loss deceases from 3rd epoch when running the train_mem.py file

Question

[Question] why the layer of the model has not gradient, the loss deceases from 3rd epoch when running the train_mem.py file

weixiaochen358 opened this issue a month ago · comments

Question

why the layer of the model has not gradient, the loss deceases from 3rd epoch when running the train_mem.py file.

from transformers import TrainerCallback

    class GradientCheckingCallback(TrainerCallback):
        def on_step_end(self, args, state, control, **kwargs):
            # This method is called at the end of each training step
            model = kwargs['model']  # Access the model
            for name, param in model.named_parameters():
                if 'mm_projector' in name and param.grad is not None:
                    print(f"Gradient for {name}: {param.grad.norm().item()}")
                elif 'mm_projector' in name:
                    print(f"No gradient for {name}")
trainer = LLaVATrainer(model=model,
                    tokenizer=tokenizer,
                    args=training_args,
                       callbacks=[GradientCheckingCallback()],
                    **data_module)

No gradient for model.mm_projector.0.weight
No gradient for model.mm_projector.0.bias
No gradient for model.mm_projector.2.weight
No gradient for model.mm_projector.2.bias
{'loss': 8.1787, 'grad_norm': 126.70487213134766, 'learning_rate': 0.001, 'epoch': 1.0}
20%|█████████ | 1/5 [00:01<00:04, 1.13s/it]No gradient for model.mm_projector.0.weight
No gradient for model.mm_projector.0.bias
No gradient for model.mm_projector.2.weight
No gradient for model.mm_projector.2.bias
{'loss': 8.1787, 'grad_norm': 126.70057678222656, 'learning_rate': 0.00075, 'epoch': 2.0}
40%|██████████████████ | 2/5 [00:01<00:02, 1.06it/s]No gradient for model.mm_projector.0.weight
No gradient for model.mm_projector.0.bias
No gradient for model.mm_projector.2.weight
No gradient for model.mm_projector.2.bias
{'loss': 9.7352, 'grad_norm': 19.390901565551758, 'learning_rate': 0.0005, 'epoch': 3.0}
60%|███████████████████████████ | 3/5 [00:02<00:01, 1.13it/s]No gradient for model.mm_projector.0.weight
No gradient for model.mm_projector.0.bias
No gradient for model.mm_projector.2.weight
No gradient for model.mm_projector.2.bias
{'loss': 6.939, 'grad_norm': 9.54937744140625, 'learning_rate': 0.00025, 'epoch': 4.0}
80%|████████████████████████████████████ | 4/5 [00:03<00:00, 1.17it/s]No gradient for model.mm_projector.0.weight
No gradient for model.mm_projector.0.bias
No gradient for model.mm_projector.2.weight
No gradient for model.mm_projector.2.bias
{'loss': 4.7991, 'grad_norm': 5.036875247955322, 'learning_rate': 0.0, 'epoch': 5.0}
{'train_runtime': 4.3987, 'train_samples_per_second': 2.273, 'train_steps_per_second': 1.137, 'train_loss': 7.56612548828125, 'epoch': 5.0}
100%|█████████████████████████████████████████████| 5/5 [00:04<00:00, 1.14it/s]