Equation (5) - partial derivative of the Euclidean norm
andgitchang opened this issue · comments
Hi,
I would like to know why you defined the L2-distance as in Equation (14) appendix.
Doesn't L2-distance need a square root outside the summations?
And also, I would like to know how the corresponding partial derivative of L2-distance in Equation (5) comes?
Thanks.
We define the L2-distance to further investigate different metrics in neural networks. We still use L1 distance in AdderNets.
The partial derivative of L2-distance uses its originial derivative .
I know you use L1 distance in forward pass and full-precision L2 derivative in backward optimization.
But my question is
- Considering L2 distance (see Definition), don't we need an extra sqrt outside the summations of Eq.(14) in your CVPR2020 supp?
- Following the def of L2 distance, shouldn't its derivative Eq.(5) in AdderNets be like \partial ||x||_2 = x / ||x||_2 ? (please refer to p-norm subsection under Examples)
If I have misunderstood anything, please correct me. Thanks.
-
Yes, so we finally use the L1-AdderNet in our main paper. The L2-AdderNet is proposed only for investigation.
-
We use the L2^2 distance in fact, as defined in our supp.
Thanks for your detailed explanation