Between np.trace(whole term) and sum of np.trace(each term)

Question

Between np.trace(whole term) and sum of np.trace(each term)

miqbal23 opened this issue 6 years ago · comments

Hi, I found that you implemented the final FID calculation result in this line

return diff.dot(diff) + np.trace(sigma1) + np.trace(sigma2) - 2 * tr_covmean

But according to your formula, the trace are applied after both sigmas and covmean being calculated, so it should be in

return diff.dot(diff) + np.trace(sigma1 + sigma2 - 2 * covmean)

I have experimented using both, and it seems that the calculation are both similar (until 13 numbers behind decimal point). Any explanation on reason of using the former?

Thomas Unterthiner · Answer 1 · Mon Nov 26 2018 19:08:01 GMT+0800 (China Standard Time)

Hi! The two expressions are mathematically equivalent due to the rules of the trace operator, so for the results doesn't really matter which one we use. We picked the former as it has slightly better running time, as a trace is O(N), while adding two matrices would be O(N^2) (where N is the side-length of the matrix). But this is of course negligible compared to the calculation of the activations, anyways.

Muhamad Iqbal · Answer 2 · Tue Nov 27 2018 13:23:56 GMT+0800 (China Standard Time)

I see. I have confirmed it from reading back on trace operation. Thank you for this confirmation