magic-research / PLLaVA

Official repository for the paper PLLaVA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Recaption bug.

guyuchao opened this issue · comments

Hi, I try the recaption in eval.sh. In the line 220, single_test, the model accurately output the caption, we show in the figure. But in captioning the Inter4K, line 134, the model's output just repeat the system prompt. I don't modify any part of given code, just to set print_res=True.

Correct output by single_test:
Screenshot 2024-05-02 at 9 38 14 PM

Incorrect output by infer_recaption:
Screenshot 2024-05-02 at 9 37 56 PM

this is my inference scripts:
Screenshot 2024-05-02 at 9 56 50 PM

If I set the conv_mode=plain instead of eval_recaption in infer_caption, the model can output answer correctly. It seems the system prompt not function correctly. Do you have any idea to fix this bug?

Another bug I find is in your eval.sh (7B), you don't pass the conv_mode argument. So different setting use the same default conv_mode="eval_videoqabench".

Screenshot 2024-05-02 at 10 27 32 PM

Hi, I try the recaption in eval.sh. In the line 220, single_test, the model accurately output the caption, we show in the figure. But in captioning the Inter4K, line 134, the model's output just repeat the system prompt. I don't modify any part of given code, just to set print_res=True.

Correct output by single_test: Screenshot 2024-05-02 at 9 38 14 PM

Incorrect output by infer_recaption: Screenshot 2024-05-02 at 9 37 56 PM

In the print_res, we've printed out everything that the Language Model outputs at LM OUTPUT TEXT (Including the prompting and user query). This was to monitor the correctness of interence. The real answer should be at the end of the LM output text, probably after "ASSISTANT:". Could you see that answer.

Hi, I try the recaption in eval.sh. In the line 220, single_test, the model accurately output the caption, we show in the figure. But in captioning the Inter4K, line 134, the model's output just repeat the system prompt. I don't modify any part of given code, just to set print_res=True.
Correct output by single_test: Screenshot 2024-05-02 at 9 38 14 PM
Incorrect output by infer_recaption: Screenshot 2024-05-02 at 9 37 56 PM

In the print_res, we've printed out everything that the Language Model outputs at LM OUTPUT TEXT (Including the prompting and user query). This was to monitor the correctness of interence. The real answer should be at the end of the LM output text, probably after "ASSISTANT:". Could you see that answer.

No answer in LM output text except some repeat of system prompt.

Another bug I find is in your eval.sh (7B), you don't pass the conv_mode argument. So different setting use the same default conv_mode="eval_videoqabench".

Screenshot 2024-05-02 at 10 27 32 PM

Hi,

Correct, we can pass in the conv_mode here, but we also have a default conv mode for each script. So it would be fine using default conv_mode for evaluating 7b and 13b model. But for 34b which uses another prompting, gotta use yi prompting.

might be a bug then, I'll look into this.

fixed at #15, check out whether it correctly fix this problem.

fixed at #15, check out whether it correctly fix this problem.

Fixed, thank you.