About dot product approximation
ohwi opened this issue · comments
Hi. First of all, thank you for your work!
I have a little question about the dot product approximation.
I had read issues about the dot product approximation like this and this.
In both explanations, to use Taylor approximation, student model accesses to the labeled images before and after the update.
However, according to the equation 12 in the paper, student model does not access labeled images before update.
I think the dot product-ing two vectors should be s_loss_us_old
and s_loss_l_new
, if I follow the variable names in your code.
I'm wondering how the code dot_product = s_loss_l_new - s_loss_l_old
approximate the dot product.
Can you help me to figure out something that I am missing?
I've understood the equations. Thank you!