Implementing BatchNorm for KFAC

Question

Implementing BatchNorm for KFAC

a-cowlagi opened this issue 3 years ago · comments

Hello,

I am trying to use BatchNormalization in my network trained on CIFAR. The network has about 50,000 parameters and I want to use the KFAC representation in order to speed up computations. However, it looks like BatchNorm2D is unimplemented for KFAC. Would it be possible to add this implementation?

Anirudh Cowlagi · Answer 1 · Tue Jul 20 2021 10:40:06 GMT+0800 (China Standard Time)

As a follow up, here is the network I am using:

class allcnn_t(nn.Module):
    def __init__(self, c1=16, c2= 32):
        super().__init__()
        d = 0

        def convbn(ci,co,ksz,s=1,pz=0):
            return nn.Sequential(
                nn.Conv2d(ci,co,ksz,stride=s,padding=pz),
                nn.ReLU(),
                nn.BatchNorm2d(co))
        

        self.m = nn.Sequential(
            nn.Dropout(0.2),
            convbn(3,c1,3,1,1),
            convbn(c1,c1,3,1,1),
            convbn(c1,c1,3,2,1),
            nn.Dropout(d),
            convbn(c1,c2,3,1,1),
            convbn(c2,c2,3,1,1),
            convbn(c2,c2,3,2,1),
            nn.Dropout(d),
            convbn(c2,c2,3,1,1),
            convbn(c2,c2,3,1,1),
            convbn(c2,10,1,1),
            nn.AvgPool2d(8),
            View(10))

        print('Num parameters: ', sum([p.numel() for p in self.m.parameters()]))

    def forward(self, x):
        return self.m(x)

Here is what I am using for the Fisher:

fisher = FIM_MonteCarlo(model=model.cpu(),
                loader=train_loader,
                representation=PMatBlockDiag,
                device= 'cpu')

Is there any immediately obvious way to speed up the Fisher computation, besides putting it on the GPU?

Thomas George · Answer 2 · Thu Jul 22 2021 19:11:46 GMT+0800 (China Standard Time)

Hello,

Unfortunately it is not really clear what to do with BatchNorm layers when trying to apply KFAC:

factorize batch norm parameters in some way?
use the full Fisher for the block corresponding to the parameters of every batch norm layers

But instead of hard coding one of this 2 options, I prefer to leave the choice to the user.

I personally prefer 2., which can be implemented using the example in https://github.com/tfjgeorge/nngeometry/blob/master/examples/FIM%20for%20EWC.ipynb , scroll down to "KFAC and Batch norm layers". In essence, it consists in using 2 separate block diagonal FIMs, one for the batch norm parameters using PMatBlockDiag, and another one for the "standard" parameters using PMatKFAC.

Thomas George · Answer 3 · Thu Jul 22 2021 19:16:03 GMT+0800 (China Standard Time)

For your 2nd questions, to speed up computation you can:

reduce the dataset size
use a GPU
change to a more efficient representation (e.g. PMatKFAC)
reduce the size of the layer with the most parameters, which will likely be the bottleneck

Anirudh Cowlagi · Answer 4 · Fri Jul 23 2021 06:55:33 GMT+0800 (China Standard Time)

Thanks this is helpful -- I will adopt the second approach! A followup related to eigendecompositions using the KFAC representation. I understand that the eigenvalues in this representation can be found by take the product between every pair of eigenvalues associated with each of the Kronecker factors, then concatenating across layers to get the full spectrum. However, I have a question regarding the eigenvectors. I understand that the eigenvectors are given by the Kronecker product between eigenvectors of the Kronecker factors, but I am struggling to efficiently implement this for my task -- something better than simply looping over the eigenvectors. Do you have any recommendation as to how to get the eigenvectors of a given block -- I would to construct a matrix where each row/column is an eigenvector, and the rows are sorted by eigenvalue. I appreciate any help!

Thomas George · Answer 5 · Fri Jul 23 2021 16:49:15 GMT+0800 (China Standard Time)

That does not exactly answer your question, but depending on why you actually want the eigendecomposition, I would rather recommend you go with a power method such as the Lanczos algorithm, which will give you approximate evals/evecs for the dominant modes, at the cost of a few Fisher-vector products. In this case, rather than using a PMatKFAC representation, you can instead use a PMatImplicit representation, which allows you to compute Fisher-vector products for the full FIM, but implicitly, meaning that you won't need to form the d x d full FIM and store it in memory. I am not sure how this will interfere with the fact that FIM_MonteCarlo is a stochastic estimator of the true FIM. If instead you are interested in studying the quality of the eigendecomposition given by KFAC, then you will need to loop over the eigenvectors. This can probably be implemented efficiently with pytorch's batch methods, prefixed with a b such as bmm.

…

On Fri, Jul 23, 2021 at 12:55 AM a-cowlagi ***@***.***> wrote: Thanks this is helpful -- I will adopt the second approach! A followup related to eigendecompositions using the KFAC representation. I understand that the eigenvalues in this representation can be found by take the product between every pair of eigenvalues associated with each of the Kronecker factors, then concatenating across layers to get the full spectrum. However, I have a question regarding the eigenvectors. I understand that the eigenvectors are given by the Kronecker product between eigenvectors of the Kronecker factors, but I am struggling to efficiently implement this for my task -- something better than simply looping over the eigenvectors. Do you have any recommendation as to how to get the eigenvectors of a given block -- I would to construct a matrix where each row/column is an eigenvector, and the rows are sorted by eigenvalue. I appreciate any help! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALTMWI6XHMG5QCAXG5NUYDTZCOW7ANCNFSM5AUTBLJQ> .

jeeyung · Answer 6 · Thu Apr 28 2022 08:32:43 GMT+0800 (China Standard Time)

The link you shared seems to be broken.(https://github.com/tfjgeorge/nngeometry/blob/master/examples/FIM%20for%20EWC.ipynb ) Could you please update it?
Also, I want to ask if there is any way of using Batchnorm layer without implementing /modifying the codes by myself?

Thomas George · Answer 7 · Thu Apr 28 2022 15:21:55 GMT+0800 (China Standard Time)

Whoops you are right that it has disappeared in a previous commit. In the meantime you can find it here: https://github.com/tfjgeorge/nngeometry/blob/82984cf20fe94752a794bc26bc19e29d0b3e589e/examples/FIM%20for%20EWC.ipynb I plan on implementing a representation with a mix of KFAC layers for linear and conv layers, and block diag for other layers. But that won't be done in the near future since I am busy with my research! T

…

On Thu, Apr 28, 2022 at 2:32 AM jeeyung ***@***.***> wrote: The link you shared seems to be broken.( https://github.com/tfjgeorge/nngeometry/blob/master/examples/FIM%20for%20EWC.ipynb ) Could you please update it? Is there any way of using Batchnorm layer without implementing /modify the codes by myself? — Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALTMWPXR5Q2T5CUZ2AL36LVHHMDPANCNFSM5AUTBLJQ> . You are receiving this because you commented.Message ID: ***@***.***>