graviraja / MLOps-Basics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is training happening?

rohitgr7 opened this issue · comments

def training_step(self, batch, batch_idx):
logits = self.forward(batch["input_ids"], batch["attention_mask"])
loss = F.cross_entropy(logits, batch["label"])
self.log("train_loss", loss, prog_bar=True)

here, the loss is not returned, is the model even training?

@rohitgr7 we are logging in to the logger. No need to return the loss unless you want perform some operation on the overall loss in an epoch. I have done that in theweek1 for validation step. Refer here: https://github.com/graviraja/MLOps-Basics/blob/main/week_1_wandb_logging/model.py. If you are returning the loss you can access it in training_epoch_end method.

@graviraja I checked it doesn't look in logged_metrics to check for loss and perform backprop. Getting this warning when nothing is returned from training_step: training_step returned None. If this was on purpose, ignore this warning...

Also in the docs it's mentioned that if nothing is returned then it will skip the corresponding training_step: https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html#training-step

A minimal example to reproduce: https://colab.research.google.com/drive/11qA_1RxcEcHkiY-Xn5EsOR8ZH0wG8O1j#scrollTo=AAtq1hwSmjKe

Fixed it. Thank you @rohitgr7