Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

Home Page:https://lightning.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Logging with Fabric using steps

liambsmith opened this issue · comments

Description & Motivation

Logging using Fabric does not consider any steps during training, unlike when using the Lightning Trainer. A LightningModule calling self.log simply passes the logged dictionary and nothing else to the Fabric logging code when using Fabric but when using the Trainer it is handled by grouping/frequency adjustments (such as aggregating during multi-gpu training or logging every X steps [default 50]).

Pitch

An option to enable similar logging in Fabric as the Lightning Trainer. This could be off by default but could track steps that are submitted with fabric hooks/calls, such as:

fabric.call('on_train_step')

This would allow for logged values to be aggregated during the same step, which makes logs more readable.

Alternatives

No response

Additional context

No response

cc @Borda

It might be better to actually have a function that accepts a log handler that by default logs how it works now but can be customized by the user.

It might be better to actually have a function that accepts a log handler that by default logs how it works now but can be customized by the user.

This could be done during setup, passing an optional function that is called when fabric receives logs from LightningModules.