huawei-noah / AdderNet

Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Equation (5) - partial derivative of the Euclidean norm

andgitchang opened this issue · comments

Hi,
I would like to know why you defined the L2-distance as in Equation (14) appendix.
Doesn't L2-distance need a square root outside the summations?
And also, I would like to know how the corresponding partial derivative of L2-distance in Equation (5) comes?
Thanks.

We define the L2-distance to further investigate different metrics in neural networks. We still use L1 distance in AdderNets.

The partial derivative of L2-distance uses its originial derivative .

I know you use L1 distance in forward pass and full-precision L2 derivative in backward optimization.
But my question is

  1. Considering L2 distance (see Definition), don't we need an extra sqrt outside the summations of Eq.(14) in your CVPR2020 supp?
  2. Following the def of L2 distance, shouldn't its derivative Eq.(5) in AdderNets be like \partial ||x||_2 = x / ||x||_2 ? (please refer to p-norm subsection under Examples)

If I have misunderstood anything, please correct me. Thanks.

  1. Yes, so we finally use the L1-AdderNet in our main paper. The L2-AdderNet is proposed only for investigation.

  2. We use the L2^2 distance in fact, as defined in our supp.

Thanks for your detailed explanation