yhhhli / BRECQ

Pytorch implementation of BRECQ, ICLR 2021

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some questions about implementation details

AndreevP opened this issue · comments

Hello, thank you for an interesting paper and nice code.

I have two questions concerning implementation details.

  1. Does the "one-by-one" block reconstruction mentioned in the paper mean that input to each block comes from already quantized preceding blocks, i.e. each block may correct quantization errors coming from previous blocks? Or maybe input to each block is collected from the full-precision model?
  2. Am I correct in my understanding that in block-wise reconstruction objective you use gradients for each object in calibration sample independently (i.e. no gradient averaging or smth, like in Adam mentioned on the paper)? Besides, what is happening here in data_utils.py, why do you add 1.0 to the gradients?
cached_grads = cached_grads.abs() + 1.0
# scaling to make sure its mean is 1
# cached_grads = cached_grads * torch.sqrt(cached_grads.numel() / cached_grads.pow(2).sum())

Thank you for your time and consideration!

Hi, I also found point 2 confusing, have you figured out the rationale behind it?