Add VLFeedback, the first DPO dataset for LVLMs

Question

Add VLFeedback, the first DPO dataset for LVLMs

TobiasLee opened this issue 6 months ago · comments

Hi authors, thanks for your great survey and the curated paper list, which has been highlighted as a reference in our related work for a detailed introduction to LVLMs.

I'd like to recommend our recent work exploring DPO for LVLMs to the list (Dataset part or maybe even a new RLHF section, including RLHF-V, LLaVA-RLHF as well? ). Based on our VLFeedback dataset annotated with GPT-4V on 80k high-quality multi-modal instructions, we found DPO is promising on benchmarks such as MME and MMHal-bench.

Project Page: https://vlf-silkie.github.io/
Paper: https://arxiv.org/abs/2312.10665
Dataset: https://huggingface.co/datasets/MMInstruction/VLFeedback
Code: https://github.com/vlf-silkie/VLFeedback

xjtupanda · Answer 1 · Thu Dec 21 2023 15:34:38 GMT+0800 (China Standard Time)

Thanks for your sharing and your citation! We've incorporated your work into our repo.

Lei Li · Answer 2 · Thu Dec 21 2023 15:49:17 GMT+0800 (China Standard Time)

Thank you for having our work! 🎉🎉
We have provided the GPT-4V annotation guide in the main paper and appendix. We will upload it to the code repo soon.