visual-question-answering

There are 20 repositories under visual-question-answering topic.

salesforce / BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
vision-language vision-and-language-pre-training image-text-retrieval image-captioning visual-question-answering visual-reasoning vision-language-transformer
Language:Jupyter Notebook 5482
OFA-Sys / OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
multimodal pretraining image-captioning text-to-image-synthesis visual-question-answering referring-expression-comprehension vision-language pretrained-models prompt prompt-tuning chinese
Language:Python 2532
peteanderson80 / bottom-up-attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
vqa visual-question-answering captioning-images faster-rcnn caffe image-captioning mscoco mscoco-dataset
Language:Jupyter Notebook 1453
lucidrains / flamingo-pytorch
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
artificial-intelligence attention-mechanism deep-learning transformers visual-question-answering
Language:Python 1261
YehLi / xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
image-captioning video-captioning vision-and-language pretraining cross-modal-retrieval visual-question-answering tden
Language:Python 970
richard-peng-xia / awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
medical-imaging medical-report-generation multimodal-deep-learning multimodal-learning visual-question-answering large-language-models large-multimodal-models multimodal-large-language-models
713
jnhwkim / ban-vqa
Bilinear attention networks for visual question answering
visual-question-answering attention bilinear-pooling pytorch-implmention
Language:Python 545
MILVLG / mcan-vqa
Deep Modular Co-Attention Networks for Visual Question Answering
visual-question-answering attention visual-reasoning
Language:Python 452
MMMU-Benchmark / MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
computer-vision deep-learning deep-neural-networks evaluation foundation-models large-language-models large-multimodal-models llm llms machine-learning multimodal multimodal-deep-learning multimodal-learning multimodality natural-language-processing question-answering stem visual-question-answering
Language:Python 411
zjukg / KG-MM-Survey
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
awsome-list cross-modal-retrieval entity-alignment entity-linking image-classification image-generation information-extraction knowledge-graph knowledge-graph-embeddings large-language-models multi-modal-fusion multi-modal-knowledge-graph multi-modal-learning paper-list survey surveys visual-question-answering awsome
401
davidmascharka / tbd-nets
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
machine-learning pytorch visualization deep-learning visual-question-answering vqa neural-networks
Language:Jupyter Notebook 348
MILVLG / openvqa
A lightweight, scalable, and general framework for visual question answering research
visual-question-answering vqa pytorch deep-learning benchmark
Language:Python 322
lupantech / MathVista
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
ai4math large-language-models large-multimadality-models machine-learning mathematics mathqa science visual-question-answering
Language:Jupyter Notebook 292
MILVLG / prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
a-okvqa gpt-3 multimodal-deep-learning okvqa prompt-engineering pytorch visual-question-answering
Language:Python 271
Cyanogenoid / pytorch-vqa
Strong baseline for visual question answering
pytorch vqa visual-question-answering baseline
Language:Python 239
HanXinzi-AI / awesome-computer-vision-resources
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
computer-vision image-classification image-segmentation semantic-segmentation medical-imaging ocr visual-question-answering image-captioning super-resolution gan face-detection face-recognition pedestrian-detection autonomous-vehicles autonomous-driving model-compression model-optimization tensorflow pytorch paddlepaddle
238
qiantianwen / NuScenes-QA
[AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.
autonomous-driving vision-language visual-question-answering
Language:Python 207
MMStar-Benchmark / MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
evaluation large-language-models large-multimodal-models large-vision-language-model large-vision-language-models llm llms lvlm lvlms multimodal multimodal-learning multimodality visual-question-answering
Language:Python 174
markdtw / vqa-winner-cvprw-2017
Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17
pytorch visual-question-answering
Language:Python 163
Yushi-Hu / tifa
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
image-to-text large-language-models text-to-image visual-question-answering
Language:Python 157
antoyang / FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
multimodal-learning video-understanding vqa weakly-supervised-learning large-language-models pre-training video-question-answering videoqa vision-and-language visual-question-answering
Language:Python 156
anisha2102 / docvqa
Document Visual Question Answering
visual-question-answering computer-vision deep-learning document-analysis
Language:Python 124
just-ask
antoyang / just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
vqa visual-question-answering videoqa video-question-answering video-understanding question-generation weakly-supervised-learning vision-and-language pre-training multimodal-learning
Language:Jupyter Notebook 120
zhegan27 / VILLA
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part
vision-and-language adversarial-training pretraining visual-question-answering neurips-2020
Language:Python 119
mesnico / RelationNetworks-CLEVR
A pytorch implementation for "A simple neural network module for relational reasoning", working on the CLEVR dataset
relation-network relationships clevr deep-learning machine-learning visual-question-answering pytorch
Language:Python 88
showlab / LOVA3
(NeurIPS 2024) Official PyTorch implementation of LOVA3
benchmark multimodal-large-language-models visual-question-answering visual-question-generation large-multimodal-models
Language:Python 82
allenai / aokvqa
Official repository for the A-OKVQA dataset
computer-vision dataset natural-language-processing visual-question-answering
Language:Python 81
Shivanshu-Gupta / Visual-Question-Answering
CNN+LSTM, Attention based, and MUTAN-based models for Visual Question Answering
pytorch visual-question-answering multimodal-tucker-fusion stacked-attention-networks attention lstm cnn deep-learning nlp
Language:Python 75
mlvlab / Flipped-VQA
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
emnlp2023 large-language-models multi-modal video-question-answering visual-question-answering
Language:Python 74
China-UK-ZSL / ZS-F-VQA
[Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph
vqa zero-shot knowledge-graph commonsense commonsense-reasoning visual-question-answering zsl fvqa zs-f-vqa
Language:Python 73
violetteshev / bottom-up-features
Bottom-up features extractor implemented in PyTorch.
vqa visual-question-answering faster-rcnn feature-extraction pytorch
Language:Python 72
rentainhe / TRAR-VQA
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
vqav2 iccv2021 transformer clevr multi-modal vision-and-language visual-question-answering pytorch multi-scale-features dynamic-network attention local-and-global multi-modality visualization multi-modal-learning official
Language:Python 66
DenisDsh / VizWiz-VQA-PyTorch
PyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People
vizwiz vqa pytorch visual-question-answering
Language:Jupyter Notebook 60
aioz-ai / MICCAI19-MedVQA
AIOZ AI - Overcoming Data Limitation in Medical Visual Question Answering (MICCAI 2019)
ai medical medical-image-processing deep-learning vqa visual-question-answering medvqa aioz aioz-ai miccai
Language:Python 59
ivonajdenkoska / multimodal-meta-learn
[ICLR 2023] Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning"
few-shot-learning image-captioning meta-learning vision-language visual-question-answering iclr-2023
Language:Python 59
KVQA
SKTBrain / KVQA
Korean Visual Question Answering
visual-question-answering dataset korean
57