`trainer.callback_metrics` included in `metric_dict` after training doesn't make sense

Question

`trainer.callback_metrics` included in `metric_dict` after training doesn't make sense

libokj opened this issue 9 months ago · comments

After checking https://github.com/Lightning-AI/lightning/blob/105b25c521e0cbc5d7b1160902ce7b64ae7c8c73/src/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py, it became clear that after finishing a complete trainer.fit loop, trainer.callback_metrics is just the metrics from the last training and validation epochs. I think it doesn't make much sense to include the training and validation metrics from the last epoch in metric_dict for sweeper optimization.

There doesn't seem to be a straightforward way to get the training and validation metrics of the best model as monitored by ModelCheckpoint, but I think a custom callback based on ModelCheckpoint to save the best_model_metrics or simply running trainer.test(..., ckpt='best') on the training and validation datasets respectively are two viable options.