Explanation for NUM_ZERO, ORTHO and ORTHO_v2
haofanwang opened this issue · comments
As title, I cannot find any detail about these inference trick in the paper. Especially for fidelity and extremely style, you use different settings.
Here is my understandings, not sure whether they are correct.
(1) For NUM_ZERO, you actually add some zero tokens to make it possible that the query discard ID information (maybe better to keep the background uncontaminated? But it is in an implicit manner.)
(2) For ORTHO or ORTHO_v2, you calculate the projection of ID_hidden_state to hidden_state, then orthogonal = id_hidden_states - projection
is to obtain more disentangled ID information. Is this a experimental finding?