DAMO-NLP-SG / VCD

[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

If two confidence levels of original and distorted inputs are high and similar, can plausibility constraints make negative effect?

QiushiYang opened this issue · comments

The overall idea of this paper is really interesting! While I am a bit confused on the specific design of adaptive plausibility constraint. We know only using prediction contrasting strategy may bring positive effects on the bias from LLM or VLM datasets, i.e., higher confident parts of distorted inputs, while it may bring negative effects on other conditions:

(1) For correct prediction of original input, if both two predictions of original and distorted inputs are similar, it will be false negative one;
(2) For incorrect prediction (i.e., hallucination), if it does not belong to LLM and VLM bias and both two predictions are dissimilar (smaller), it will be false positive one.

The adaptive plausibility constraint selects high confident predictions to perform prediction contrasting, however, if the confidence levels of above two conditions are high, the negative positve & negative ones are selected, bringing misleading constraints.

Moreover, I understand that if the threshold is very high, it will only remain the highest prediction, while I notice the beta is set as 0.1, is seems it usually remains multiple candidates (right?), and the negative positive & negative may exist. If the beta is set very high so that it only remains the highest one, does the VCD make equal effect with max(logit) on most samples?

I am confused on above analysis, could you help me interpret them? Thanks a lot!