The Common Visual Data Foundation is a 501(c)(3) non-profit organization with a mission to enable open community-driven research in computer vision.

Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.


Dataset with 5 million images depicting human-made and natural landmarks spanning 200 thousand classes.


The AVA dataset densely annotates 80 atomic visual actions in 351k movie clips with actions localized in space and time, resulting in 1.65M action labels with multiple labels per human occurring frequently.


The MNIST database of handwritten digits is one of the most popular image recognition datasets. It contains 60k examples for training and 10k examples for testing.