Add a tensorboard log for final training loss value.
rnyak opened this issue · comments
Describe the bug
When I check the loss plot on TensorBoard, I see that validation steps are much higher than training steps. See the screenshot below. why training ends 1062 steps before the validation steps? What's the logic behind?
Minimum Reproducible Example
A short code snippet which reproduces the exception
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.
@benleetownsend any explanation on that? Thanks in advance.
So, we explicitly run validation on the final model no matter the val interval. This is for the purpose of keep_best_model, otherwise we can accidentally waste the final set of steps. We just follow the default logging schedules for the training loss. If you wanted to track loss values near the end of training you could change your val_interval such that a final value will come near the end of training.
I'm going to rename this issue to track the feature of adding a final step loss log for training
Hopefully this helps/answers your question.