Mayo-Clinic-STRIP-AI

Goal

The objective is to build a model that can classify two major acute ischemic stroke etiology subtypes:

CE (Cardioembolic)
LAA (Large Artery Atherosclerosis).

Dataset Description

The dataset for this competition comprises over a thousand high-resolution whole-slide digital pathology images. Each slide depicts a blood clot from a patient that had experienced an acute ischemic stroke. The slides comprising the training and test sets depict clots with an etiology (that is, origin) known to be either CE (Cardioembolic) or LAA (Large Artery Atherosclerosis).

Data Field Descriptions

train.csv:

image_id: A unique identifier for this instance having the form {patient_id}_{image_num}. Corresponds to the image {image_id}.tif.
center_id: Identifies the medical center where the slide was obtained.
patient_id: Identifies the patient from whom the slide was obtained.
image_num:Enumerates images of clots obtained from the same patient.
label: The etiology of the clot, either CE or LAA. This field is the classification target.

An example can be seen:

image_id	center_id	patient_id	image_num	label
008e5c_0	11	008e5c	0	CE
00c058_0	11	00c058	0	LAA
026c97_0	4	026c97	0	CE
049194_0	5	49194	0	CE
049194_1	5	49194	1	CE

Preprocessing

The training WSI (Whole Slide Images) are massive in filesize due to their high resolutions. I was able to shrink the dataset down from ~241 gigabytes down to a few gigabytes. The preprocessing can be generalized:

Load large .tif WSI
Crop WSI using PyVips smart crop with attention features
Resize image to specified width x height
Delete parts of image that contain low signal
Export as JPEG with quality set to 100%

Training

I was able to submit two entries for evaluation due to time constraints and issues with loading images without running out of memory. First, I tried AutoGluon with swin_large_patch4_window7_224. Second, I used Keras with Tensorflow to apply transfer learning & fine-tuning techniques by using the latest EfficientNet B4 with NoisyStudent + RandAugment pre-trained weights.

I attempted to use Monai, fastMonai, PathML, and cuCIM, but I encountered problems properly loading the WSI (memory constraints or unknown error) or slow processing. However, these libraries appear promising, and I would like to experiment with them again in the future.

Additionally, this challenge introduced me to the concept of MIL (multiple instance learning) and how it can be used to train WSIs by reducing memory constraints and training on unmodified tiles. Finally, I plan on going through the winning solutions and attempting to understand other approaches to tackling this challenge.

Final Results

There was a total of 896 teams competiting, 1,025 competitors, and 6,980 entries. Based on the final results, my model ranked within the top 28% of submissions and placed 240/888.

matin-n / Mayo-Clinic-STRIP-AI