MIT-Adobe FiveK Dataset
Tools for MIT-Adobe FiveK Dataset

MIT-Adobe FiveK Dataset

The MIT-Adobe FiveK Dataset [1] is a publicly available dataset providing the following items.

5,000 RAW images in DNG format
retouched images of each RAW image by five experts in TIFF format (25,000 images, 16 bits per channel, ProPhoto RGB color space, and lossless compression)
semantic information about each image

The dataset was created by MIT and Adobe Inc., and is intended to provide a diverse and challenging set of images for testing image processing algorithms. The images were selected to represent a wide range of scenes, including landscapes, portraits, still lifes, and architecture. The images also vary in terms of lighting conditions, color balance, and exposure.

Official Website

License

LicenseAdobe.txt covers files listed in filesAdobe.txt
LicenseAdobeMIT.txt covers files listed in filesAdobeMIT.txt

Data Samples

Raw (DNG)	Categories	Camera Model
a0001-jmac_ DSC1459.dng	{"location":"outdoor","time": "day","light": "sun_sky","subject": "nature"}	Nikon D70
a1384-dvf_095.dng	{ "location": "outdoor", "time": "day", "light": "sun_sky", "subject": "nature" }	Leica M8
a4607-050801_ 080948__ I2E5512.dng	{ "location": "indoor", "time": "day", "light": "artificial", "subject": "people" }	Canon EOS-1D Mark II

References

@inproceedings{fivek,
	author = "Vladimir Bychkovsky and Sylvain Paris and Eric Chan and Fr{\'e}do Durand",
	title = "Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs",
	booktitle = "The Twenty-Fourth IEEE Conference on Computer Vision and Pattern Recognition",
	year = "2011"
}

Tools for MIT-Adobe FiveK Dataset

This repository provides tools to download and use MIT-Adobe FiveK Dataset in a machine learning friendly manner.

The official archive has a complicated directory structure and that expert images need to be downloaded individually. To simplify this process, I created a tool that allows all data to be downloaded with just a single line of python code.

In practice, the dataset is often used after RAW images have undergone various processing steps, such as adding noise, overexposure, and underexposure to emulate camera errors. This tool also allows for these kinds of processing to be easily performed using PyTorch's DataLoader. You can iteratively retrieve data via Pytorch's DetaLoader for your own use.

Requirements

Python 3.7 or greater
Pytorch 2.X
tqdm
urllib3

Usage

locate dataset/fivek.py and dataset/fivek_builder.py in your program.
import MITAboveFiveK in your python code.
download the dataset by initializing a MITAboveFiveK instance with download=True.

fivek = MITAboveFiveK(root="/datasets", split="debug", download=True, experts=["a"])

data can be iteratively obtained via PyTorch's DataLoader.

You can use as follows.

NOTE: For DataLoader, MUST set `batch_size` to `None` to disable automatic batching.

from torch.utils.data.dataloader import DataLoader
from dataset.fivek import MITAboveFiveK

data_loader = DataLoader(
    MITAboveFiveK(root="path-to-dataset-root", split="train", download=True, experts=["a"]),
    batch_size=None)

for item in data_loader:
    # Processing as you want.
    # Add noise, overexpose, underexpose, etc.
    print(item["files"]["dng"])

Example

Please see sample code .

Easiy Multi-Process Pre-Processing

If you set a function for preprocessing as process_fn, you can use PyTorch DataLoader to perform preprocessing in a multi-process manner! See sample code .

    class Preprocess:
        def hello_world(self, item):
            print(f"hello world! the current ID is {item['id']}")

    data_loader = DataLoader(
        MITAboveFiveK(
            root=args.root_dir,
            split="debug",
            process_fn=Preprocess().hello_world),
        batch_size=None,  # must be `None`
        num_workers=args.workers  # multi-process for pre-processing
    )
    for item in data_loader:
        # pre-processing has already been performed.
        print(item)

API

CLASS MITAboveFiveK(torch.utils.data.dataset.Dataset)

MITAboveFiveK(root: str, split: str, download: bool = False, experts: List[str] = None) -> None

root (str):
The root directory where the MITAboveFiveK directory exists or to be created.
split (str):
One of {'train', 'val', 'test', 'debug'}. 'debug' uses only 9 data contained in 'train'.
download (bool):
If True, downloads the dataset from the official urls. Files that already exist locally will skip the download. Defaults to False.
experts (List[str]):
List of {'a', 'b', 'c', 'd', 'e'}. 'a' means 'Expert A' in the website. If None or empty list, no expert data is used. Defaults to None.
download_workers (int):
How many subprocesses to use for data downloading. None means that min(32, cpu_count() + 4). Defaults to 1.
process_fn ([[Dict[str, Any]], Any]): Function of the processing to be performed on each element of the dataset. This function applied in getitem(). Defaults to None.

Format to be acquired by DataLoader

{
   "basename": "<(str) basename of the image>"
    "files": {
        "dng": "<(str) path of the local DNG file>", 
        "tiff16": {
            "a": "<(str) path of the local TIFF file retouched by Expert A>",
            "b": "<(str) path of the local TIFF file retouched by Expert B>",
            "c": "<(str) path of the local TIFF file retouched by Expert C>",
            "d": "<(str) path of the local TIFF file retouched by Expert D>",
            "e": "<(str) path of the local TIFF file retouched by Expert E>"
        }
    },
    "categories": {
        "location": "<(str) image categories extracted from the official resouece>",
        "time": "<(str) image categories extracted from the official resouece>",
        "light": "<(str) image categories extracted from the official resouece>",
        "subject": "<(str) image categories extracted from the official resouece>"
    },
    "id": <(int) id extracted from basename>,
    "license": "<(str) Adobe or AdobeMIT>",
    "camera": {
        "make": "<(str) maker name extracted from dng>",
        "model": "<(str) camera model name extracted from dng>"
    }
}

example

from torch.utils.data.dataloader import DataLoader
from dataset.fivek import MITAboveFiveK

data_loader = DataLoader(
    MITAboveFiveK(root="/datasets", split="debug", download=True, experts=["a", "c"], download_workers=4),
    batch_size=None)
item = next(iter(data_loader))
print(item)
# 
# Output↓
# {'categories': {'location': 'outdoor', 'time': 'day', 'light': 'sun_sky', 'subject': 'nature'}, 
#  'id': 1384, 'license': 'Adobe', 
#  'camera': {'make': 'Leica', 'model': 'M8'}, 
#  'files': {'dng': '/datasets/MITAboveFiveK/raw/Leica_M8/a1384-dvf_095.dng', 
#            'tiff16': {'a': '/datasets/MITAboveFiveK/processed/tiff16_a/a1384-dvf_095.tif', 
#                       'c': '/datasets/MITAboveFiveK/processed/tiff16_c/a1384-dvf_095.tif'}}, 
#  'basename': 'a1384-dvf_095'}

Directory Structure

When a dataset is downloaded using the MITAboveFiveK class, the files are saved in the following structure. RAW images are stored in a directory for each camera model.

<root>
└── MITAboveFiveK
    ├─ raw
    |   ├── Canon_EOS-1D_Mark_II
    |   |   ├── a1527-20041010_072954__E6B5620.dng
    |   |   └── ...
    |   ...
    |   └── Sony_DSLR-A900
    |       ├── 4337-kme_1082.dng
    |       └── ...
    ├── processed
    |   ├── tiff16_a
    |   |   ├── a0001-jmac_DSC1459.tif
    |   |   └── ...
    |   ├── tiff16_b
    |   └── ...
    ├── training.json
    ├── validation.json
    ├── testing.json
    └── debugging.json

Resources

I provides json files that contain metadata for each image.

Split	Json File	Number of data	Note
train	training.json	3500
val	validation.json	500
test	testing.json	1000
debug	debug.json	9	Subset of train

yuukicammy / mit-adobe-fivek-dataset

Table of Contents

MIT-Adobe FiveK Dataset

Official Website

License

Data Samples

References

Tools for MIT-Adobe FiveK Dataset

Requirements

Usage

Example

Easiy Multi-Process Pre-Processing

API

Format to be acquired by DataLoader

Directory Structure

Resources

About

Languages