There are 7 repositories under vqa topic.
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
北京航空航天大学大数据高精尖中心自然语言处理研究团队开展了智能问答的研究与应用总结。包括基于知识图谱的问答(KBQA),基于文本的问答系统(TextQA),基于表格的问答系统(TableQA)、基于视觉的问答系统(VisualQA)和机器阅读理解(MRC)等,每类任务分别对学术界和工业界进行了相关总结。
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.
Visual Question Answering in Pytorch
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks
Strong baseline for visual question answering
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Multimodal Question Answering in the Medical Domain: A summary of Existing Datasets and Systems
Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
读过的CV方向的一些论文,图像生成文字、弱监督分割等
Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
Tensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering
Using pretrained encoder and language models to generate captions from multimedia inputs.
Code release for ICLR 2023 paper: SlotFormer on object-centric dynamics models
This project is out of date, I don't remember the details inside...
Towards World's Most Comprehensive Curated List of LLM Related Papers & Repositories