Inquiry Regarding Your Awesome work "Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering"

Question

Inquiry Regarding Your Awesome work "Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering"

jian0805 opened this issue 5 months ago · comments

Hello,
We have recently been trying to reproduce the "Vision-Language Alignment" component explored in the "Fine-Grained Post-Interactive Multimodal Retrieval for Retrieving Enhanced Visual Question Answers" section of your paper. "

Could you kindly inform us about which specific fields of the WIT (Wikipedia-based Image Text) dataset were utilized in your research? Additionally, we are curious about the volume of data that was employed for training purposes.

If permissible, we would be immensely grateful for the opportunity to access the relevant portion of your code or model weights (mapping network F_M of DPR Baseline and FMLR). This would significantly aid us in advancing our research, and we assure you that all due credit and references to your groundbreaking work will be prominently acknowledged in any of our subsequent publications or presentations in this field.

We understand the value of your work and appreciate any assistance you can provide. Thank you very much for considering our request.

Lin Weizhe · Answer 1 · Sat Jan 20 2024 16:39:46 GMT+0800 (China Standard Time)

We used the first 5 splits of WIT's training set:
wit_v1.train.all-00000-of-00010.tsv to wit_v1.train.all-00004-of-00010.tsv
Here I attached a short script for processing WIT fields into passages:

def process_example(item):
            passage_content = f"title: {item['page_title']}"
            if item['section_title'] is not None:
                passage_content += f" section title: {item['section_title']}"
            if item['hierarchical_section_title'] is not None:
                passage_content += f" hierarchical section title: {item['hierarchical_section_title']}"
            if item['caption_reference_description'] is not None:
                passage_content += f" caption reference description: {item['caption_reference_description']}"
            if item['caption_attribution_description'] is not None:
                passage_content += f" caption attribution description: {item['caption_attribution_description']}"
            if item['caption_alt_text_description'] is not None:
                passage_content += f" caption alt text description: {item['caption_alt_text_description']}"
           
            passage_content += f" content: {item['context_page_description']}"

            item['passage_content'] = passage_content

            return item