Japanese Visual Genome VQA dataset

We have created a Japanese visual question answering (VQA) dataset by using Yahoo! Crowdsourcing, based on the images from the Visual Genome dataset. Our dataset is meant to be comparable to the freeform QA part of Visual Genome dataset. The dataset consists of 99,208 images, together with 793,664 QA pairs in Japanese with every image having eight QA pairs.

Annotation Format

The annotations are stored in a single JSON file. The data format is a subset of Visual Genome dataset v1.2.

License

Creative Commons Attribution 4.0 License

Citation

@InProceedings{C18-1163,
  author = 	"Shimizu, Nobuyuki
		and Rong, Na
		and Miyazaki, Takashi",
  title = 	"Visual Question Answering Dataset for Bilingual Image Understanding: A Study of Cross-Lingual Transfer Using Attention Maps",
  booktitle = 	"Proceedings of the 27th International Conference on Computational Linguistics",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"1918--1928",
  location = 	"Santa Fe, New Mexico, USA",
  url = 	"http://aclweb.org/anthology/C18-1163"
}

yahoojapan / ja-vg-vqa

Japanese Visual Genome VQA dataset

Annotation Format

License

Citation

About