Using the MLflow logger produces Inconsistent metric plots
gboeer opened this issue · comments
Bug description
When using the MLFlowLogger I have noticed that in some cases, the produced plots in the overview Model metrics
section of the MLflow web app are messed up.
Additionally, the plots for the same metric, when viewed in the detail view are displayed correctly, hence are very different from the plots in the overview tab.
I am not absolutely sure, but I think this may have to do with how the step
parameter is propagated to MLflow or how the global_step
is calculated.
In my current experiment I use a large training set, and a smaller validation set and I have set the Trainer to log_every_n_steps=20
.
For the training step this seems to work fine (the plots all look good), but I guess that during the validation step this log step size is larger than the total amount of batches per validation step. If so however, I still wonder why the plots in the detailed view of the validation metrics all look fine, but only the plots in the Model metrics
overview are messed up.
During the validation step I tried using the normal lightning self.log
, as well as the self.logger.log_metrics
the self.logger.experiment.log_metric
and the direct api mlflow.log_metric
, all which lead to similar results (though also not the same produced plots).
def validation_step(self, batch, batch_idx):
inputs, labels, _ = batch
outputs = self.model(inputs)
loss = self.val_criterion(outputs, labels)
_, predictions = torch.max(outputs, 1)
val_accuracy = torch.sum(predictions == labels.data).double() / labels.size(0)
self.log("val_accuracy", val_accuracy)
self.logger.log_metrics({"logger_val_accuracy": val_accuracy}, step=self.global_step)
self.logger.experiment.log_metric(key="logger_experiment_val_accuracy", value=val_accuracy, step=self.global_step, run_id=self.logger.run_id)
See the following images which illustrate the plots for each of those calls:
For comparison, the log for the detailed metric view from the same experiment
I would like to point out that I don't see this behavior with other experiments, usually with smaller sized datasets where I also used smaller log_every_n_steps and that yet I have not been able to reproduce this issue with those smaller setups.
Edit: Another side note, I also use the same metric val_accuracy
(the one I log with the simple self.log()) as monitor for the ModelCheckpoint which also works as expected. So internally the metric is calculated and handled correctly, and the detailed metric plot also reveals this. Only the overview pane for all metrics for some reason shows this strange behavior.
What version are you seeing the problem on?
v2.2
How to reproduce the bug
No response
Error messages and logs
No response
Environment
Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):
More info
No response