There are 0 repository under vqav2 topic.
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
A Visual Question Answering model implemented in MindSpore and PyTorch. The model is a reimplementation of the paper *Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering*. It's our final project for course DL4NLP at ZJU.
Pytorch implementation of VQA using Stacked Attention Networks: Multimodal architecture for image and question input, using CNN and LSTM, with stacked attention layer for improved accuracy (54.82%). Includes visualization of attention layers. Contributions welcome. Utilizes Visual VQA v2.0 dataset.
Adaptively fine tuning transformer based models for multiple domains and multiple tasks
Official implementation of "Deep Residual Weight-Sharing Attention Network with Low-Rank Attention for Visual Question Answering" (RWSAN) published in the IEEE Transactions on Multimedia (TMM), 2022.
Grid features extraction for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
Leveraging the BLIP Model for Visual Question Answering: A Comparative Analysis on VQA and DAQUAR Datasets