Documentation for pl.LightningModule that includes many nn.Modules

Question

Documentation for pl.LightningModule that includes many nn.Modules

turian opened this issue 4 years ago · comments

I have a pl.LightningModule (pytorch-lightning) that includes many nn.Modules.

It's not obvious from the documentation how I can profile all the LightningModule tensors and the subordinate Module tensors. Could you please provide an example?

Joseph Turian · Answer 1 · Wed Dec 16 2020 01:00:12 GMT+0800 (China Standard Time)

Here is an example:

https://colab.research.google.com/github/PytorchLightning/pytorch-lightning/blob/master/notebooks/01-mnist-hello-world.ipynb

In my code (not the colab above, but a similar style), I don't OOM when I create the model. I OOM when I run

trainer.fit(model)

How do I memory profile why I OOM?

Kaiyu Shi · Answer 2 · Fri Dec 18 2020 13:11:38 GMT+0800 (China Standard Time)

THX for reporting. I'll investigate the integration with pytorch lightning in this weekend.

But in principle, the only thing need to be done is to add the forward function into the line_profiler.

Kaiyu Shi · Answer 3 · Fri Dec 18 2020 17:50:40 GMT+0800 (China Standard Time)

It looks like our current implementation cannot profiling the detailed memory usage inside nn.Module. However you can work this around by simply defining a dummy container Module like:

class Net(pl.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv1D(xxx)
    @profile
    def forward(self, input):
        out = self.conv1(input)
        return out

Joseph Turian · Answer 4 · Sat Dec 19 2020 09:41:30 GMT+0800 (China Standard Time)

@Stonesjtu if I have an nn.Module that contains other nn.Modules (which in turn contain other nn.Modules), do I add @Profile decorator to all nn.Modules to see what is happening? Thank you for the help.

Kaiyu Shi · Answer 5 · Sun Dec 20 2020 20:16:06 GMT+0800 (China Standard Time)

A common workflow is to profile top-down. Usually 2 or 3 profile should give you an overall memory consumption statistics.

Joseph Turian · Answer 6 · Sat Apr 22 2023 23:08:48 GMT+0800 (China Standard Time)

@Stonesjtu wanted to ping on this issue to see if there is a better way to use memlab with lightning now.