python deep deeplearning computer-vision cnn-model vqa vqa-dataset machine-learning

Strong baseline for visual question answering

This is a re-implementation of Vahid Kazemi and Ali Elqursh's paper Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering in PyTorch.

The paper shows that with a relatively simple model, using only common building blocks in Deep Learning, you can get better accuracies than the majority of previously published work on the popular VQA v1 dataset.

A fully trained model (convergence shown below) is available for download.

Note that the model in my other VQA repo performs better than the model implemented here.

This project uses the code provided here

About

python deep deeplearning computer-vision cnn-model vqa vqa-dataset machine-learning

Languages

Language:Python 100.0%