Computer-aid diagnosis (CADx) and detection (CADe) for breast cancer in mammography with combined UNet and ResNet
This project aims to develop an approach which could improve current CADx, CADe performance in breast cancer assesment using mammography studies. Model has to parts: sementic segmentation (UNet) and classification part (ResNet). Both models are trained separetely, since there were significant time and memory restrictions in the project.
- Create new virtaulenv:
$ virtualenv -p python3 breast_cancer_research_venv
- Source environment:
$ source breast_cancer_rsearch_venv/bin/activate
- Install required packages:
$ pip install -r requirements.txt
- Setup internal sources (for debugging advised in development mode):
$ pip install -e .
CBIS-DDSM is a database of 2,620 scanned film mammography studies. It contains benign, and malignant cases with verified pathology information. The scale of the database along with ground truth validation makes the DDSM a useful tool in the development and testing of decision support systems.
- Install NBIA Data Retriever and search CBIS-DDSM in Data Portal. Instruction for this task is available in Cancer Imaging Archive website.
- Download CBIS-DDSM data using NBIA Data Retriever. It is important to download data into "Classic Directory Name", since CBIS-DDSM metadata are prepared to work in this folder tree type.
- Change data generation config
/config/data_generation_config.json
so it fits your directories. Config description:downloaded_images_root
: dir to downloaded images (by default endingCBIS-DDSM
)downloaded_metadata_dir
: dir to downloaded metadata (.csv
files)transform_dicom_to_png
: boolean flag which indicates if images would beimages_out_dir
: dir where images in png format should be storedmetadata_out_dir
: dir where new metadata would be stored (ifNone
they would be stored indownloaded_metadata_dir
)train_val_test_split
: list of floats, which should add up to 1,default_split
: prepare train, val, test split for enabling comparing results between researchers
- Run data convertion and metadata preparation scirpt (it could take several hours (i.e. all files have to be converted to .png and single findings would be merged):
$ python scripts/instance_2_semantic_data.py --data_config_path configs/data_generation_config.csv
Arguments:
--data_config_path
: config for data convertion and metadata preparation
Two models, pretrained on train+validation set, are available:
- unet:
pretreined_models/unet.pth
- resnet:
resnet_models/resnet.pth
python scripts/train_unet.py --unet_config_path configs/unet_config_train.json --train_metadata_path data/semantic_labels/train_metadata.csv
--val_metadata_path data/semantic_labels/val_metadata.csv
Arguments:
--unet_config_path
: UNet configuration file--train_metadata_path
: path to train metadata--val_metadata_path
: path to validation metadata--cross_validation
: boolean which indicateas whethear cross-validation should be used. If set, params which would be used for cross validation are taken from--unet_config_path
python scripts/evaluate_unet.py --unet_config_path configs/unet_config_eval.json --metadata_path data/semantic_labels/test_metadata.csv
--predict_images
Arguments:
--unet_config_path
: UNet configuration file--metadata_path
: path to test metadata--predict
: boolean which indicateas whethear images should be stored to tensorboard
python scripts/train_resnet.py --resnet_config_path configs/resnet_config_train.json --train_metadata_path data/semantic_labels/train_metadata.csv
--val_metadata_path data/semantic_labels/val_metadata.csv --unet_config_path configs/unet_config.eval.json
Arguments:
--resnet_config_path
: ResNet configuration file--train_metadata_path
: path to train metadata--val_metadata_path
: path to validation metadata--unet_config_path
: UNet configuration file (for processing input data to ResNet)--cross_validation
: boolean which indicateas whethear cross-validation should be used. If set, params which would be used for cross validation are taken from--resnet_config_path
python scripts/evaluate_resnet.py --resnet_config_path configs/resnet_config_eval.json --metadata_path data/semantic_labels/test_metadata.csv
--unet_config_path configs/unet_config.eval.json --predict_images
Arguments:
--resnet_config_path
: UNet configuration file--metadata_path
: path to test metadata--unet_config_path
: UNet configuration file (for processing input data to ResNet)--predict_images
: boolean which indicateas whethear images should be stored to tensorboard
This work was significantly influences by recent New York University research. UNet implementation is based on this repository.