adriano-lucieri / SCDB

Simple Concept DataBase for concept learning and localization.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Simple Concept DataBase

SCDB is a synthetic dataset developed for concept localization and inspired by the challenges of skin lesion classification using dermatoscopic images. It mimics the complex composition of diagnostic criteria in skin lesions e.g. spatial overlap, providing concept annotations and concept segmentation masks.

If you use this dataset, please consider citing our associated paper:

    @InProceedings{lucieri2020explaining,
    author="Lucieri, Adriano
    and Bajwa, Muhammad Naseer
    and Dengel, Andreas
    and Ahmed, Sheraz",
    title="Explaining AI-Based Decision Support Systems Using Concept Localization Maps",
    booktitle="Neural Information Processing",
    year="2020",
    publisher="Springer International Publishing",
    address="Cham",
    pages="185--193",
    isbn="978-3-030-63820-7"
    }

Dataset Description

Skin lesions are represented as big geometric base shapes filled with concepts, that are represented as smaller geometries that are randomly coloured, shaped and oriented. 10 shapes representing single concepts are used:

  • Cross
  • Ellipse
  • Hexagon
  • Line
  • Pentagon
  • Rectangle
  • Star
  • Starmarker
  • Triangle
  • Tripod

Concepts relevant to the target classifciation task occure only within the area of the base shape. 8 out of 10 concept classes are relevant for classifciation. Two concept classes (Cross, Line) are non-correlated to target classes. Target classes are indicated by following concept combinations:

Target Class Indicative Concept Combinations
C1 Hexagon&Star,
Ellipse&Star,
Triangle&Ellipse&Starmarker
C2 Pentagon&Tripod,
Star&Tripod,
Rectangle&Star&Starmarker

Dataset Files

For each dataset split (train, val, test), label annotations (.csv) as well as concept annotations (.npy) are available. A separate concept split can be used for CAV training.

Label Files

The .csv files are provided in the form "filepath|label".

Concept Files

Concept annotations are provided in the form of binary, multilabel vectors of the size [Nx10], with N = number of samples.

Segmentation Files

Each split folder contains a Segmentation folder that contains a maximum of 10 concept-specific segmentation maps per sample. The concept's outline is segmented through a circle, covering the complete outline of the shape.

Dataset Distribution

Split Datafile Annotations Samples
Train train.csv train.npy 4800
Validation val.csv val.npy 1200
Test test.csv test.npy 1500
Concept concept.csv concept.npy 6000

About

Simple Concept DataBase for concept learning and localization.