Scaling of parameter space representations
ksnxr opened this issue · comments
Many thanks for this interesting library!
Comparing with analytical expressions, I think the provided dense representation of Fisher information matrix is calculated as the expectation over the data points in the train loader. Are the other representations, e.g. KFAC and EKFAC, on the same scale? Or, is there a constant scaling, e.g. by the batch size, that we should be aware of?
Thanks for your quick response. I have a use case where it is necessary to have approximations that are supposed to be of the same scale as the analytical Fisher; Since I couldn't find something about this in the project, I figure it might be better to verify this