diviz-mit / visuallydata

A large-scale curated dataset of Visual.ly infographics with metadata and additional crowdsourced annotations for research applications in computer vision and natural language processing.

Home Page:http://visdata.mit.edu

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Visually29K: a large-scale curated infographics dataset

In this repository, we provide metadata, annotations, and processing scripts for tens of thousands of infographics, for computer vision and natural language research. What kinds of applications can this data be used for? Category (topic) prediction, tagging, popularity prediction (likes & shares), text understanding and summarization, captioning, icon and object detection, design summarization and retargeting. We provide starter code for a subset of these applications, and provide metadata including text detections, icon detections, tag and category annotations available in different formats to make this data easy to use and adaptable to different tasks.

This repo is associated with the following project page: http://visdata.mit.edu/ and the manuscripts: "Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics" and "Understanding infographics through textual and visual tag prediction".

Infographics & metadata

Crowdsourced icon annotations

Icon dataset

1 Links to large data files that could not be directly stored in this repository can be found in links_to_data_files.md

Starter files and scripts

  • howto.ipynb shows how to parse the metadata for the infographics. Note that we do not provide the infographics themselves, as they are property of Visual.ly, but we do provide URLs for the infographics and a way to obtain them. We also provide the train/test splits which we used for category and tag prediction. The metadata contains attributes that we did not use for our prediction tasks, including popularity indicators (likes, shares, views), and designer-provided titles and captions.
  • plot_text_detections.ipynb plots detected and parsed text (via Google's OCR API) on top of the infographics, and demonstrates the few different formats we make available from which the parsed text data can be loaded. This text can be a rich resource for natural language processing tasks like captioning and summarization.
  • plot_icon_detections.ipynb loads in our automatically-computed icon detections and classifications for 63K infographics (note that for reporting purposes, only the results on the test set of the 29K subset of infographics are used). These detections and classifications can either be used as a baseline to improve upon, or be used directly as input to new tasks like captioning, retargeting, or summarization.
  • plot_human_annotations.ipynb loads in data for 1.4K infographics that we collected using crowdsourced (Amazon's Mechanical Turk) annotation tasks. Specifically, we asked participants to annotate the locations of icons inside the infographics. Additionally, human_annotation_consistency.ipynb provides some scripts for computing consistency between participants at this annotation task. This data is meant to be used as a ground truth for evaluation of computational models.
  • save_tag_to_relevant_infographics.ipynb contains scripts to find and plot the infographics that match different text queries, for a demo retrieval application. Search engines typically use meta-data to determine which images to serve based on a search query. They do not look inside the image. In contrast, our automatically pre-computed detections allow us to find the infographics that contain matching text and icons.

Featured projects

  • featured_projects.md contains links to other repositories that use our Visually29K dataset, to seed new project ideas and give students and researchers some potential starting points for projects.

If you use the data or code in this git repo, please consider citing:

@inproceedings{visually2,
    author    = {Spandan Madan*, Zoya Bylinskii*, Matthew Tancik*, Adrià Recasens, Kimberli Zhong, Sami Alsheikh, Hanspeter Pfister, Aude Oliva, Fredo Durand}
    title     = {Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics},
    booktitle = {arXiv preprint arXiv:1807.10441},
    url       = {https://arxiv.org/pdf/1807.10441},
    year      = {2018}
}
@inproceedings{visually1,
    author    = {Zoya Bylinskii*, Sami Alsheikh*, Spandan Madan*, Adria Recasens*, Kimberli Zhong, Hanspeter Pfister, Fredo Durand, Aude Oliva}
    title     = {Understanding infographics through textual and visual tag prediction},
    booktitle = {arXiv preprint arXiv:1709.09215},
    url       = {https://arxiv.org/pdf/1709.09215},
    year      = {2017}
}

About

A large-scale curated dataset of Visual.ly infographics with metadata and additional crowdsourced annotations for research applications in computer vision and natural language processing.

http://visdata.mit.edu

License:MIT License


Languages

Language:Jupyter Notebook 93.2%Language:Python 6.4%Language:C 0.2%Language:Cuda 0.2%Language:C++ 0.0%Language:Shell 0.0%