Multiple Instance Learning (MIL) methods are mainstream approaches for pathological image classification and analysis.
The CAMELYON-16/17 datasets are commonly used to evaluate MIL methods.
However, they have the following issues:
CAMELYON-16/17 datasets contain some problematic slides
Pixel-annotations of CAMELYON-16/17 test-dataset not accurate enough
Different MIL methods do not have a unified dataset-split and evaluation-metrics on the CAMELYON dataset
To conclude,there is no BENCHMARK for MIL methods
what we do in this work?
We do the following work to establish a CAMELYON+ BENCHMARK
Remove some problematic slides.
Correct problematic annotations.
Merge the correct version of**CAMELYON-16/17** datasets as the CAMELYON+ dataset.
Evaluate mainstream MIL methods on the CAMELYON-NEW dataset.
Evaluate mainstream feature extractors on the CAMELYON-NEW dataset.
Use more comprehensive evaluation metrics to assess different methods.
In summary, we establish a new CAMELYON+ BENCHMARK.