vqav2

There are 0 repository under vqav2 topic.

rentainhe / TRAR-VQA
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
vqav2 iccv2021 transformer clevr multi-modal vision-and-language visual-question-answering pytorch multi-scale-features dynamic-network attention local-and-global multi-modality visualization multi-modal-learning official
Language:Python 63
vtu81 / NaiveVQA
A Visual Question Answering model implemented in MindSpore and PyTorch. The model is a reimplementation of the paper *Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering*. It's our final project for course DL4NLP at ZJU.
mindspore pytorch vqa deep-learning vqav2 nlp
Language:Jupyter Notebook 8
phiyodr / vqaloader
PyTorch DataLoader for many VQA datasets
dataloader gqa pytorch textvqa vqa vqav2
Language:Python 6
williamcfrancis / Visual-Question-Answering-using-Stacked-Attention-Networks
Pytorch implementation of VQA using Stacked Attention Networks: Multimodal architecture for image and question input, using CNN and LSTM, with stacked attention layer for improved accuracy (54.82%). Includes visualization of attention layers. Contributions welcome. Utilizes Visual VQA v2.0 dataset.
computer-vision deep-learning natural-language-processing pytorch stacked-attention-networks visual-question-answering vqa vqav2
Language:Jupyter Notebook 5
adaptively-finetuning-transformers
itsShnik / adaptively-finetuning-transformers
Adaptively fine tuning transformer based models for multiple domains and multiple tasks
transformers finetuning vlbert lxmert vision-and-language visual-question-answering pytorch vqav2 vqacpv2 spottune blockdrop
Language:Python 4
BrightQin / RWSAN
Official implementation of "Deep Residual Weight-Sharing Attention Network with Low-Rank Attention for Visual Question Answering" (RWSAN) published in the IEEE Transactions on Multimedia (TMM), 2022.
pytorch vqav2
Language:Python 3
rentainhe / TRAR-Feature-Extraction
Grid features extraction for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
visual-question-answering iccv2021 extract-features pytorch vqav2 vqa vqa-dataset
Language:Python 2
shreyas21563 / VQA-using-BLIP
Leveraging the BLIP Model for Visual Question Answering: A Comparative Analysis on VQA and DAQUAR Datasets
blip computer-vision image-captioning inference machine-learning natural-language-processing visual-question-answering vqav2 daquar accuracy bert-score bleu-score wups
Language:Jupyter Notebook

vqav2

rentainhe / TRAR-VQA

vtu81 / NaiveVQA

phiyodr / vqaloader

williamcfrancis / Visual-Question-Answering-using-Stacked-Attention-Networks

itsShnik / adaptively-finetuning-transformers

BrightQin / RWSAN

rentainhe / TRAR-Feature-Extraction

shreyas21563 / VQA-using-BLIP