Question about init_balanced: why "-log(N+1)" operation is need in bias initialization?

Question

Question about init_balanced: why "-log(N+1)" operation is need in bias initialization?

EricKani opened this issue 4 years ago · comments

same as title~

Thanks!

Fabio Cermelli · Answer 1 · Fri Dec 04 2020 19:18:28 GMT+0800 (China Standard Time)

HI @EricKani.
The goal is to distribute the "old" background probability to the "new" background and to novel classes.
To do so, what we need to do is (1) set the weights for novel classes as the background, and (2) transform the bias of both novel classes and background to make sure that new_probability = old_bkg_proability / (| new_classes | +1), where +1 represent the bkg class.

You can see the equations 8 and 9 in the paper. In Eq. 9, we use -log(|C^t| ) since |C^t| equals to (| new_classes | +1).

Fabio Cermelli · Answer 2 · Tue Dec 15 2020 17:33:45 GMT+0800 (China Standard Time)

Hi @EricKani.
Is it clear why I used that equation? Can I close the issue?

EricKani · Answer 3 · Tue Dec 15 2020 18:24:32 GMT+0800 (China Standard Time)

Sorry for late response... I understand how the implementation works to a large extent. Thanks a lot~
But I still don't understand "log" 's role here... Could you explain that in more detail? Thank you very much!

Fabio Cermelli · Answer 4 · Mon Mar 22 2021 17:16:24 GMT+0800 (China Standard Time)

Hi @EricKani,
Sorry for the too-late response.
The role of the log is to split the background probability on background old and new classes. That has been derived from some math equations, imposing that the probability of old bkg class was equally divided on novel classes.