czifan / Multimodal-Medicine-AI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multimodal Medicine AI

Multimodal Medical Datasets

๐Ÿฆ˜Multimodal Medical Datasets

Treatment Response Evaluation

Year Paper Code Cancer Modalities Data Source Patients Fusion Mode
2023 ๐Ÿ”— ccRCC Gene TCGA, In-House ~1000 Middle
2023 ๐Ÿ”— ๐Ÿ”— 18 solid tumor types Path, Gene, Clin In-House 2881 Middle
2023 ๐Ÿ”— ๐Ÿ”— GC Rad, Clin In-House 249 Middle
2023 ๐Ÿ”— NSCLC Rad, Clin In-House 264 Late
2022 ๐Ÿ”— ๐Ÿ”— NSCLC Rad, Path, Gene In-House 249 Middle
2022 ๐Ÿ”— Rad UKBB, EchoNet-Dynamic, GSTFT, GSTFT CRT 62 for response predictions and 10,730 for training segmentation models Middle
2021 ๐Ÿ”— NSCLC Rad, Lab, Clin In-House 200 Middle
2020 ๐Ÿ”— HCC Rad In-House 737 Middle
[Nov. 2023] Multi-omics features-based machine learning method improve immunotherapy response in clear cell renal cell carcinoma, bioRxiv

Paper

  • Cancer: Clear Cell Renal Cell Carcinomas
  • Modalities: Gene Data (bulk RNA, scRNA, DNA)
  • Data Source: TCGA, In-House dataset
  • Patients: >1900 patients with immune-mediated kidney discorders; >400 patients with ccRCC treated by ICBs; ~1000 patients as the immune cohort for ccRCC
  • Pipeline:
    • extracting six distinct types of features (TIs) from multimodal gene data
    • using XGBoost to predict response based on these features
  • Fusion Mode: Middle-fusion, using XGBoost to integrate multimodal features
[Jul. 2023] Robust prediction of patient outcomes with immune checkpoint blockade therapy for cancer using common clinical, pathologic, and genomic feature, bioRxiv

Paper Code

  • Cancer: 18 solid tumor types
  • Modalities: Pathologic, Gene Data, Clinical Data
  • Data Source: In-House dataset
  • Patients: 2881 immune checkpoint blockade (ICB)-treated patients across 18 solid tumor types
  • Pipeline: Using machine learning (i.e., decision tree, random forest) to take the clinical, pathologic, genomic features as inputs and make predictions
  • Fusion Mode: Middle-fusion, using ML algorithms to integrate multimodal features
โญ๏ธ [Jul. 2023] Cancer immunotherapy response prediction from multi-modal clinical and image data using semi-supervised deep learning, Radiotherapy and Oncology

Paper Code

  • Cancer: Gastric Cancer
  • Modalities: Radiological Images (CTs), Clinical Data
  • Data Source: In-House
  • Patients: 249 advanced gastric cancer patients treated with immunotherapy, and an additional dataset of 2029 patients who did not receive immunotherapy in a semi-supervised framework to learn intrinsic imaing phenotypes of the disease
    • 168 advanced GC patients treated with immunotherapy for training
    • two independent cohorts of 81 patients treated with immunotherapy for evaluating model performance
  • Pipeline:
    • an MLP for extracting clinical features from clinical data
    • an MLP for mapping radiomics features extracted from CTs
    • a CNN for extracting deep image features from CTs
    • concatenating these features into a multimodal features and predicting response/non-response via an MLP
    • this work innovatively employs a semi-supervised framework to leverage unlabeled examples (patients not treated with immunotherapy). Specially, for labeled example, the consistent loss is employed to consist the teacher model's predictions (predicted by multimodal features) and student model's predictions (predicted by only deep image features); for unlabeled example, the consistent loss is used to consist the teacher model's predictions (applied weak augmentation for CTs) and student model's predictions (applied strong augmentation for CTs). The teacher model is an ema model from student models.
  • Fusion Mode: Middle-fusion, concatenating multimodal features for predictions via an MLP
[Mar. 2023] Integration of longitudinal deep-radiomics and clinical data improves the prediction of durable benefits to anti-PD-1/PD-L1 immunotherapy in advanced NSCLC patients, Journal of Translational Medicine

Paper

  • Cancer: Advanced Non-small Cell Lung Cancer (NSCLC)
  • Modalities: Radiological Images (CTs with follow-ups), Clinical Data (demographic, epidemiologic data, hemogram with follow-ups)
  • Data Source: In-House dataset
  • Patients: 264 patients with pathologically confirmed stage IV NSCLC treated with immunotherapy from two institutions, randomly divided into a training (n=221) and an independent test set (n=43)
  • Pipeline:
    • using Radiomics and NoduleX to extract time-series CT features and then concatenating them to as the input of Random Forest to predict response
    • clinical data is first encoded by one-hot encoding and then concatenated to as the input of another Random Forest to predict response
    • averaging these two results to get ensemble prediction
  • Fusion Mode: Late-fusion, averaging multimodal predictions into an ensemble prediction
โญ๏ธ [Aug. 2022] Multimodal integration of radiology, pathology and genomics for prediction of response to PD-(L)1 blockade in patients with non-small cell lung cancer, Nature Cancer

Paper Code

  • Cancer: Non-small Cell Lung Cancer, predicting immunotherapy response
  • Modalities: Radiological Images (CTs), Pathological Images (digitized programmed death ligand-1 immunohistochemistry slides), Gene Data
  • Data Source: In-House Dataset
  • Patients: 249 patients at Memorial Sloan Kettering (MSK) Cancer Center with advanced NSCLC who received PD-(L)1-blockade-based therapy with baseline data and known outcomes between 2014 and 2019
  • Pipeline:
    • extracting radiomics features using expert segmented thoracic CT scans (Radiology Radiomics per site)
    • extracting image-based IHC texture from original digitized PD-L1 IHC slide via the tumor segmentation mask and several visual transformations (Pathology GLCM and TPS)
    • obtaining genomic alterations and TMB
    • DyAM was used for multimodal integration. CT segmentation-derived features were separated by lesion type (lung PC, PL and LN) with separate attention weights applied. Attention weights are also used for genomics and PD-L1 IHC-derived features to result in a final prediction of response.
  • Fusion Mode: Middle-fusion, using a multimodal dynamic attention with masking to integrate multimodal features and address missing data
[Apr. 2022] A multimodal deep learning model for cardiac resynchronisation therapy response prediction, Medical Image Analysis

Paper

  • Cancer: Non-Cancer, predicting cardiac resynchronisation therapy response
  • Modalities: 2D echocardiography and cardiac magnetic resonace (CMR) data
  • Data Source:
    • UK Biobank (UKBB) for pre-training the CMR segmentation model
    • EchoNet-Dynamic dataset for pre-training the echocardiography segmentation model
    • Guys and St Thomas NHS Foundation Trust (GSTFT) for training and validating the CMR and echocardiography segmentation models
    • GSTFT CRT echocardiography database for testing the proposed model in the intended clinical application of using only echocardiography data at test time
  • Patients:
    • UK Biobank (UKBB): 700 healthy subjects
    • EchoNet-Dynamic dataset: 10,030 patients
    • Guys and St Thomas NHS Foundation Trust (GSTFT): 50 HF patients and 50 CRT patients (32/50 patients who were classified as responders to CRT)
    • GSTFT CRT echocardiography database: 12 CRT patients (7/12 patients who were classified as responders to CRT)
    • a total of 62 patients for response predictions
  • Pipeline:
    • the nnU-Net architecture is used to extract segmentations of the heart over the full cardiac cycle from the two modalities
    • training the multimodal deep learning (MMDL) by maximizing the correlation between two modalities' latent respresents
    • combining the latent spaces of the nnU-Net models from two modalities through average
    • using a SVM classifier for predicting CRT response
  • Fusion Mode: Middle-fusion, maximizing the correlation between multimodal features and averaging them
โญ๏ธ [Feb. 2021] A multi-omics-based serial deep learning approach to predict clinical outcomes of single-agent anti-PD-1/PD-L1 immunotherapy in advanced stage non-small-cell lung cancer, American Journal of Translational Research

Paper

  • Cancer: Non-small-cell Lung Cancer (NSCLC)
  • Modalities: Radiological Images (serial radiomics), Laboratory Data, Baseline Clinical Data
  • Data Source: In-House Dataset
  • Patients: 200 advanced stage NSCLC patients with 1633 CT scans and 3414 blood samples who received single anti-PD-1/PD-L1 agent between April 2016 and December 2019
  • Pipeline:
    • using the proposed Simple Temporal Attention (SimTA) moduels to process asynchronous clinical time series (i.e. the radiomics and blood tests) separately
    • the encoded features of these time series and static clinical information are then fused by a MLP to get the final output for the assessment prediction of responders/non-responders
  • Fusion Mode: Middle-fusion, concatenating radiomics and blood test features and then using MLP for predictions
[Jun. 2020] Prediction of prognostic risk factors in hepatocellular carcinoma with transarterial chemoembolization using multi-modal multi-task deep learning, eClinicalMedicine

Paper

  • Cancer: Hepatocellular Carcinoma
  • Modalities: Radiological Images (CTs)
  • Data Source: In-house dataset
  • Patients: a total 737 patients, 478 patients (64.9%) underwent surgical resection; 16 patients (2.2%) underwent liver transplantation and 243 patients (32.9%) underwent nonsurgical TACE treatment.
  • Pipeline:
    • a Random forest feature selection and a SVM predictor used to develop MVI-score and Edmondson' score in 494 HCCs with surgical resection
    • multi-task DL networks to build a prognostic score for HCC survival after TACE
      • first, a DAE is used to reduce and transform 2420 radiomics features from 243 HCCs with TACE into 70 new features from the bottleneck hidden layer of the networks
      • then, six time-varying DL algorithms were used to train the obtained DAE-transformed features and the one perform best was used to build a prognostic score to compute the survival probabilities on the time grid
    • Finally, MVI-score, Edmondson's score, DL-based survival score and evidenced-based clinicoradiologic score were integrated into a Cox-PH model to obtain a precise prediction
  • Fusion Mode: Middle-fusion, using Cox-PH model to integrate multimodal scores into a prognostic prediction

Prognosis Evaluation

Year Paper Code Cancer Modalities Data Source Patients Fusion Mode
2024 ๐Ÿ”— ๐Ÿ”— ccRCC Path, Rad, Clin In-House, TCGA, CPTAC 414 Middle
2024 ๐Ÿ”— ๐Ÿ”— mRNA, miRNA, methylation BRCA, ROSMAP, LGG, KIPAN 351+875+510+658 samples Middle
2024 ๐Ÿ”— Breast Gene, Trans, Prot, Meta, Rad, Path In-House 773 Middle
2023 ๐Ÿ”— ๐Ÿ”— Rad, Text MIMIC-CXR, PadChest ~380k pairs Middle
2023 ๐Ÿ”— Rad, Non-imaging ADNI 248 Middle
2022 ๐Ÿ”— BC Path, Clin, Gene TCGA 196 Middle
2022 ๐Ÿ”— ๐Ÿ”— Pan-cancer Path, Molecular profile data TCGA 5720 Middle
2022 ๐Ÿ”— ๐Ÿ”— OC Rad, Path, Clin MSKCC, TCGA-OV 444 Late
2022 ๐Ÿ”— ๐Ÿ”— Imaging, Non-Imaging OAI, ADNI 4796 (knee OA), 2577 (AD) Middle
2022 ๐Ÿ”— ๐Ÿ”— X-ray, Non-Imaging OAI 4796 Middle
2022 ๐Ÿ”— Brain Path, Gene TCGA-LGG, TCGA-GBM 470 Middle
2021 ๐Ÿ”— ๐Ÿ”— Breast Path, Clin In-House, TCGA 127+123 Middle
2021 ๐Ÿ”— ๐Ÿ”— Five cancer types Path, Gene TCGA (BLCA, BRCA, GBMLGG, LUAD, UCEC) 437+1022+1011+515+538 Middle
2020 ๐Ÿ”— Brain MRIs BraTS 2019 335 Middle
2020 ๐Ÿ”— ๐Ÿ”— Glioma, ccRCC Path, Gene TCGA-GBM, TCGA-LGG 769 Middle
2020 ๐Ÿ”— ๐Ÿ”— GBM Path, Gene, Clin TCGA, TCIA 447 Middle
2020 ๐Ÿ”— ๐Ÿ”— ccRCC Rad, Path, Gene, Clin TCGA 209 Middle
2019 ๐Ÿ”— ๐Ÿ”— Glioblastoma Clin SEER 20821 Early
2019 ๐Ÿ”— ๐Ÿ”— Pancancer Clin, Gene, Path TCGA 11160 Middle
2017 ๐Ÿ”— LUNA Path, Path Reports, Gene, Proteomics TCGA 538 Middle
โญ๏ธ [Mar. 2024] Deep learning-based multi-model prediction for disease-free survival status of patients with clear cell renal cell carcinoma after surgery: a multicenter cohort study, International Journal of Surgery

Paper Code

  • Cancer: Clear cell renal cell carcinoma (ccRCC) after surgery
  • Modalities: Pathological whole-slide images, CT images, and clinical data
  • Data Source:
    • (General cohort) 238 ccRCC patients receiving radical or partial nephrectomy from January 2008 to December 2016 in Renji hospital were included.
    • (TCGA cohort) 137 patients with ccRCC were recruited from The Cancer Genome Atlas.
    • (CPTAC cohort) 39 ccRCC patients meeting the criteria mentioned above from the Clinical Proteomic Tumor Analysis Consortium
  • Patients: A total of 414 patients.
  • Pipeline:
    • Deep learning-based prediction score (DLPS): for pathological whole-slide images, dividing them into patches (256x256) according to tissue regions; then employing multiple-isntance learning to learn features; last, a CNN was used to convert these patches into 2048-dim feature vectors.
    • Machine learning-based pathomics signature (MLPS): based on the authors' previous study, identifying five segmentation features.
    • Radiomics prediction score (RADIS): using PyRadiomics to extract 2400 features from CT images within manually delineated RoIs, then 7 radiomics features were selected via least absolute shrinkage and selection operator regression.
    • Multi-modal prediction signature (MMPS): applying cox regression analysis, developing a multi-modal prediction signature (MMPS) based on DLPS, MLPS, RADIS, and clinicopathological features (tumor stage and tumor grade).
  • Fusion Mode: Middle-fusion, using a cox model to integrate multimodal features
[Mar. 2024] MOCAT: multiโ€‘omics integration with auxiliary classifiers enhanced autoencoder, BioData Mining

Paper [Code][https://github.com/Yaolab-fantastic/MOCAT]

  • Cancer: Non-Cancer
  • Modalities: mRNA, miRNA, methylation
  • Data Source: BRCA, ROSMAP, LGG, and KIPAN
  • Patients:
    • ROSMAP: NC: 169, AD: 182
    • BRCA: Luminal A: 436, Luminal B: 147, HER2enriched: 46, Normal-like: 115, Basal-like: 131
    • LGG: Grade 2: 246, Grade 3: 264
    • KIPAN: KICH: 66, KIRC: 318, KIRP: 274
  • Pipeline:
    • using modal-specific model with autoencoder to extract three modalities' features, respectively
    • concatenating multimodal features and employing multi-head attention to mine interaction of inter- and intra- modality features
    • using a classifier to predict disease status
    • using a ConfNet to refine the predicted probabilities (similar to multi-head classification ensemble)
  • Fusion Mode: Middle-fusion, concatenating multimodal features
[Feb. 2024] Integrated multiomic profiling of breast cancer in the Chinese population reveals patient stratification and therapeutic vulnerabilities, Nature Cancer

Paper

  • Cancer: Breast Cancer
  • Modalities: Genomic, Transcriptomic, Proteomic, Metabolomic, Radiomic, and Digital pathological characteristics
  • Data Source: In-House
  • Patients: A total of 773 patients with breast cancer nationwide from China who were treated at Fudan University Shanghai Cancer Center during 2013 and 2014
  • Pipeline:
    • most of the content analyzes multimodal data from a clinical perspective
    • using OneHotEncoder in scikit-learn to incorporate multimodal discrete features
    • employing a Cox proportional hazards model to predict outcomes
  • Fusion Mode: Middle-fusion, using OneHotEncoder to incorporate multimodal features
[Sep 2023] Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training by Diminishing Bias, NeurIPS

Paper Code

  • Cancer: Non-Cancer, make experiments across 5 medical image tasks and 10 datasets encompassing over 30 diseases
  • Modalities: Radiological Images (CXR images), Free-text Data (radiology reports)
  • Data Source: MIMIC-CXR, PadChest
  • Patients: Pre-training on approximately 220k image-text pairs for MIMIC-CXR and 160k pairs for PadChest, then applied to four downstream tasks: medical image linear classification, medical image zero-shot classification, medical image semantic segmentation, and medical image object detection
  • Pipeline:
    • for free-text data, using the corss-lingual medical LM to align different languages
    • for CXR images, using contrastive learning to align image features (apply random augmentations to the original images to create augmented views as postive samples while treating the rest of the images in the mini-batch as negative samples)
    • following CLIP, a contrastive learning is used to align vison-language features
    • introducing Cross-lingual Text Alignment Regularization (CTR) to learn language-independent text representations and neutralize the adverse effects of community bias on other modalitieslearn
  • Fusion Mode: Middle-fusion, aligning different modalities' features within hidden space
[Mar. 2023] HGIB: Prognosis for Alzheimerโ€™s Disease via Hypergraph Information Bottleneck, arXiv

Paper

  • Cancer: Non-Cancer, predicting Alzheimer's disease prognosis
  • Modalities: Radiological Images (MRI and PET), Non-imaging Information
  • Data Source: Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset (adni.loni.usc.edu)
  • Patients: 248 patients with complete three modalities from ADNI-2
  • Pipeline:
    • using different pre-trained backbones to extract features from different modalities
    • for each modality, building a corresponding hypergraph, whose hyperedge represents the relationship between a subset of the patients, then concatenating all hypergraphs to generate the final hypergraph
    • employing hypergraph convolution to aggregating message in the hypergraph
    • applying hypergraph information bottleneck (HGIB) for requiring the node representation to minimize the information from hypergraph-structured data while maximizing the information to make prognostic prediction
  • Fusion Mode: Middle-fusion, concatenating hypergraphs from different modalities and employing hypergraph convolution and hypergraph information bottleneck (HGIB) to integrate multimodal information
[Oct. 2022] ICSDA: a multi-modal deep learning model to predict breast cancer recurrence and metastasis risk by integrating pathological, clinical and gene expression data, Briefings in Bioinformatics

Paper

  • Cancer: Breast Cancer
  • Modalities: Pathological Images (H&E), Clinical Information (TNM staging, clinical staging, age, axillary lymph node metastasis), Gene Data
  • Data Source: TCGA
  • Patients: 196 patients, divided into the training and testing sets with a ratio of 7:3, in which the distributions of the samples were kept between the two datasets by hierarchical sampling
  • Pipeline:
    • applying feature selection to select features from clinical information and sequencing data
    • employing ResNet18 to extract deep image features within the tissue area in the H&E images (patching WSI into tiles); then the attention module is used to aggregate patches' features into a final pathological image deep feature
    • concatenating the pathological image deep feature, sequencing data and clinical data and then predicting prognosis via FC layers
  • Fusion Mode: Middle-fusion, concatenating different modalities' features
โญ๏ธ [Aug. 2022] Pan-cancer integrative histology-genomic analysis via multimodal deep learning, Cancer Cell

Paper Code

  • Cancer: Pan-cancer, including 14 cancer types
  • Modalities: Pathological H&E WSIs, Molecular profile data
  • Data Source: TCGA
  • Patients: 6592 gigapixel WSIs from 5720 patien samples across 14 cancer types from the TCGA
  • Pipeline:
    • using attention-based MIL to extract WSIs' features
    • using MLPs to extract molecular profile data features
    • employing pathomic fusion to integrate dual modalities' features
    • using Shapley Additive Explanation (SHAP)-styled attribution decision plots to visualize the attribution weight and direction of each molecular feature
  • Fusion Mode: Middle-fusion, employing pathomic fusion to integrate multimodal features
[Jun. 2022] Multimodal data integration using machine learning improves risk stratification of high-grade serous ovarian cancer, Nature Cancer

Paper Code

  • Cancer: Ovarian Cancer
  • Modalities: Radiological CTs, Pathological images, Clinical data
  • Data Source: MSKCC, TCGA-OV
  • Patients: 444 patients, including 296 patients treated at the Memorial Sloan Kettering Cancer Center (MSKCC) and 148 patients from The Cancer Genome Atlas Ovarian Cancer (TCGA-OV); 40 test cases were randomly sampled from the entire pool of patients with all data modalities available for analysis, and the resting of 404 patients for training
    • 404 training patients: 243 had H&E WSIs, 245 had adnexal lesions on pre-treatment CE-CT, 251 had omental implants on pre-treatment CE-CT
    • 40 test patients: all had omental lesions on CE-CT, H&E WSIs
  • Pipeline:
    • using PyRadiomics for Radiological CTs; pre-training a ResNet-18 as histopathological tissue-type classifier and for extracting cell type features and tissue-type features; encoding clinical data as binary variables or one-hot categorical variables
    • using univariate Cox proportional hazards model to select features
    • employing a multivariable Cox model for late fusing
  • Fusion Mode: Late-fusion, using a multivariate Cox model to integrate unimodal submodelsโ€™ predictions
[Oct. 2022] Clinically-Inspired Multi-Agent Transformers for Disease Trajectory Forecasting from Multimodal Data (CLIMATv2), IEEE Transactions on Medical Imaging (TMI)

Paper Code

  • Cancer: Non-Cancer, predicting the development of structural knee osteoarthritis changes and forcasting Alzheimer's disease clinical status
  • Modalities: Imaging Data (MRI, PET, ...) and Non-Imaging Data (Clinical evaluation, neuropsychological tests, genetic testing, ...)
  • Data Source: Osteoarthritis Initiative (OAI) cohort; Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort
  • Patients: 4796 patients for knee OA structureal prognosis prediction; 2577 patients for AD clinical status prognosis prediction
  • Pipeline:
    • a transformer-based radiologist block to extact imaging features (the agent act as a radiologist)
    • a transformer-based context block to extact non-imaging features
    • concatenating imaging features and non-imaging features, then employing a transformer-based general practitioner block to fuse multimodal features (the agent act as a general practitioner)
    • the prognostic predictions is temporal, and the first time-point's prognostic prediction is required to be consisted with the diagnostic prediction
  • Fusion Mode: Middle-fusion, concatenating imaging features and non-imaging features and employing a transformer to fuse multimodal features
[Apr. 2022] CLIMAT: Clinically-Inspired Multi-Agent Transformers for Knee Osteoarthritis Trajectory Forecasting (CLIMAT), ISBI

Paper Code

  • Cancer: Non-Cancer,
  • Modalities: Imaging Data (X-ray) and Non-Imaging Data (clinical variables like age, sex, BMI, history injurey, surgey, and total Western Ontario and WOMAC)
  • Data Source: Osteoarthritis Initiative (OAI) cohort
  • Patients: 4796 patients for knee OA structureal prognosis predictions
  • Pipeline: The pipeline is similar to CLIMATv2, but does not do the first time-point's prognostic and diagnostic predictions consistency measures.
  • Fusion Mode: Middle-fusion, concatenating imaging features and non-imaging features and employing a transformer to fuse multimodal features
[Apr. 2022] A Multi-modal Fusion Framework Based on Multi-task Correlation Learning for Cancer Prognosis Prediction (MultiCoFusion), Artificial Intelligence in Medicine

Paper

  • Cancer: Brain Lower Grade Glioma, Glioblastoma Multiforme
  • Modalities: Pahological images, Gene (mRNA)
  • Data Source: TCGA-LGG, TCGA-GBM
  • Patients: 470 patients
    • For pathological images, a pre-proposed dataset, consisting of 954 ROIs from WSIs for 470 patients
    • For gene data, one patient (TCGA-06-0152) is missing mRNA expression data, and the rest of 469 patients contain 953 mRNA samples. For cancer grade classification, i.e., Grade II (393 samples), III (408), IV (152). Each mRNA expression data have 10673 genes.
    • 80% for training and 20% for testing
  • Pipeline:
    • pre-trained ResNet-152 for histopathological images; a sparse graph convolutional network (SGCN) for mRNA expression data
    • fusing these representations by a FCN
    • the fused FCN is a multi-task shared network, outputing survival analysis and cancer grade classification simultaneously
  • Fusion Mode: Middle-fusion
โญ๏ธ [Dec. 2021] Prediction of HER2-positive breast cancer recurrence and metastasis risk from histopathological images and clinical information via multimodal deep learning, Computational and Structural Biotechnology Journal

Paper Code

  • Cancer: Breast cancer
  • Modalities: Histopathological images, clinical information
  • Data Source: In-house, TCGA
  • Patients: 127 HER2-positive breast cancer patients with known recurrence and matastasis status from Cancer Hospital of the Chinese Academy of Medical Sciences (Dataset); 123 HER2-positive breast cancer patients with available H&E image and known recurrence and metastasis status in The Cancer Genome Atlas (TCGA)
  • Pipeline:
    • dividing histological images into patches and using CNNs for feature extraction
    • integrating image features and clinical features through multimodal compact bilinear (MCB)
    • using a output layer to predict risk scores
  • Fusion Mode: Middle-fusion, using MCB to integrate multimodal features
โญ๏ธ [Oct. 2021] Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images, ICCV

Paper Code

  • Cancer: Five cancer types
  • Modalities: Pathological images (WSIs), Gene
  • Data Source: TCGA, five largest cancer datasets from TCGA
  • Patients: Bladder Urothelial Carcinoma (BLCA) (n = 437), Breast Invasive Carcinoma (BRCA) (n = 1022), Glioblastoma & Lower Grade Glioma (GBMLGG) (n = 1011), Lung Adenocarcinoma (LUAD) (n = 515), and Uterine Corpus Endometrial Carcinoma (UCEC) (n = 538).
  • Pipeline:
    • dividing WSIs into 256x256 patches, and extracting instance-level patch embeddings
    • mapping genomic features into genomic embeddings
    • applying co-attention to enhancing instance-level patch embeddings into genomic-guided WSI embeddings (Query=Gene embeddings; Key & Value=instance-level patch embeddings)
    • employing Transformer to integrating instance-level embeddings into bag-level embeddings
    • concatenating two-modal features and predicting risk scores
  • Fusion Mode: Middle-fusion, concatenating multimodal features
[Dec. 2020] Brain Tumor Survival Prediction using Radiomics Features, MICCAI

Paper

  • Cancer: Brain Tumor
  • Modalities: MRI-T1-weighted, MRI-T2-weighted, T1-contrast enhanced, FLAIR
  • Data Source: BraTS 2019
  • Patients: 259 subjects diagnosed with HGG and 76 subjects diagnosed with LGG along with ground truth annotations by experts. The data comprises of MRI images from 19 different institutions of four MRI modalities
  • Pipeline:
    • extracting image slices corresponding to tumor regions from multiple MRI modalities
    • extracting radiomics features (i.e. first-order statistics, shape features, and texture features) from these 2D slices
    • training machince learning classifiers (i.e. KNN, SVM, DT, RF, and DA) to make prognositic predictions
  • Fusion Mode: Middle-fusion, using machine learning classifiers to integrate multimodal features from multiple MRIs
โญ๏ธ [Sep. 2020] Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis, TMI

Paper Code

  • Cancer: Glioma, Clear Cell Renal Cell Carcinoma
  • Modalities: Pathological Images, Gene Data (mutations, CNV, RNA-Seq)
  • Data Source: TCGA-GBM, TCGA-LGG
  • Patients: 769 patients
  • Pipeline:
    • using CNNs, parameter efficient GCNs or a combination of the two to extract histology features
    • using a feed-forword network to extract genomic features
    • first training unimodal networks for the respective image and genomic features individually for the corresponding supervised learning task, then used as feature exxtractors for multimodal fusion
    • multimodal fusion is performed by applying an gating-based attention mechanism to first control the expressiveness of each modality, followed by the Kronecker product to model pairwise feature interactions across modalities
    • finally, using cox model for survival analysis and the FC layers for classification
  • Fusion Mode: Middle-fusion, employing gating-based attention mechanism followed by a Kronecher product to intergate multimodal features
[Jan. 2020] PAGE-Net: Interpretable and Integrative Deep Learning for Survival Analysis Using Histopathological Images and Genomic Data, Pacific Symposium on Biocomputing

Paper Code

  • Cancer: Glioblastoma Multiforme
  • Modalities: Pathological Images (WSIs), Gene Data, Clinical Data
  • Data Source: TCGA, TCIA
  • Patients: 447 GBM patients
  • Pipeline:
    • patching WSIs into patches; the patch-wise pre-trained CNN is used to extract pathological features; then the pathology hidden layer is used to aggregate these features for as input of Cox layer
    • gene features is extracted by a series layers, inlcuding gene layer, pathway layer, H1 and H2 layers
    • clinical features is extracted by the clinical layer
    • these three modalities' features are concatenated and as the input of the Cox layer for prediction
  • Fusion Mode: Middle-fusion, concatenating multimodal features and using Cox layer for survival analysis
[Jan. 2020] Integrative analysis of cross-modal features for the prognosis prediction of clear cell renal cell carcinoma, Bioimage informatics

Paper Code

  • Cancer: Clear Cell Renal Cell Carcinoma
  • Modalities: Radiological Images (CTs), Pathological Images, Gene Data, Clinical Information
  • Data Source: TCGA
  • Patients: 209 patients, randomly divided into training (n=139, 66.51%) and testing cohorts (n=70, 33.49%)
  • Pipeline:
    • selecting genes by their variation coefficients and employing the weighted gene co-expression network analysis (WGCNA) for gene analysis
    • using two CNNs with same structure to extract deep features from CT and histopathological images
    • using a parameter-free multivariate feature selection method (called block filtering post-pruning search (BFPS) algorithm) for feature selection; then applying a further faeture selection for the combination of the selected CT features, histopathological features and eigengenes for prognositic prediction via the Cox model
  • Fusion Mode: Middle-fusion, conbinating the selected CT features, histopathological features and eigengenes
[Oct. 2019] An Online Calculator for the Prediction of Survival in Glioblastoma Patients Using Classical Statistics and Machine Learning, RESEARCHโ€”HUMANโ€”CLINICAL STUDIES

Paper Code

  • Cancer: Glioblastoma
  • Modalities: Clinical information, including continuous variables (age, tumor diameter, ...), categorical variables (sex, race, ...)
  • Data Source: Surveillance Epidemiology and end results (SEER) dataset (2005-2015)
  • Patients: in total 20821 patients split into a training and hold-out test set in an 80/20 raio
  • Pipeline:
    • for censored survival data, using Cox proportional hazards regression (CPHR) and accelerated failure time (AFT) algorithms
    • for predictive analysis, using 15 machine learning and statistical algorithms
  • Fusion Mode: Early-fusion, taking continuous variables and categorical variables as inputs, actually, it acts as the multi-variables analysis
[Jul. 2019] Deep learning with multimodal representation for pancancer prognosis prediction, Bioinformatics

Paper Code

  • Cancer: Pancancer
  • Modalities: Clinical Data, Gene (mRNA, microRNA), Pathological Images (WSIs)
  • Data Source: TCGA
  • Patients: 11160 patients, split into training and testing datasets in 85/15 ratio
  • Pipeline:
    • for the clinical data, using FC layers with sigmoid activations
    • for the genomic data, using deep highway networks
    • for the WSI images, using the SqueezeNet
    • developing an unsupervised encoder (metric learning) to compress different modalities into a single feature vector for each patient (maximizing cosine similarity between positive samples while minimizing cosine similarity between negative samples)
    • handling missing data through a resilient, mltimodal dropout method
    • averaging different modalities' features into a 512 feature vector and using a prediction layer for survival prediction
  • Fusion Mode: Middle-fusion, align first and then average
[Dec. 2017] Association of Omics Features with Histopathology Patterns in Lung Adenocarcinoma, Cell Systems

Paper

  • Cancer: Lung Adenocarcinoma
  • Modalities: Pathological Images, Pathological Reports, Gene (RNA sequencing), Proteomics
  • Data Source: TCGA
  • Patients: 538 patients
  • Pipeline:
    • converting pathological images into overlapping tiles and selected the ROIs to extract quantitative features (i.e. size, shape, intensity distribution, and texture features); identifing pathology grade from pathology reports; collecting gene and protein expression data by RNA sequencing and reverse-phase protein array
    • employing feature selection on the training set
    • building a random forest model for prognostic prediction
  • Fusion Mode: Middle-fusion, using a random forest model to integrate multimodal features

Others

Year Paper Code Cancer Modalities Data Source Patients Fusion Mode
2024 ๐Ÿ”— Rad In-House
2023 ๐Ÿ”— ๐Ÿ”— SPN Rad, Clin NLST, EHR-Pulmonary, Image-EHR, In-House 2668 (public), 1449 (in-house) Middle
2023 ๐Ÿ”— ๐Ÿ”— X-rays, Text In-House 51511 Middle
2023 ๐Ÿ”— LUNA Rad, Clin In-House 199 Middle
2022 ๐Ÿ”— ๐Ÿ”— Rad, Clin MIMIC >40,000 Middle
2022 ๐Ÿ”— ๐Ÿ”— Brain MRIs BraTS 2018 285 Middle
2021 ๐Ÿ”— MRIs, PETs ADNI 820 Middle
2021 ๐Ÿ”— MRI, Gene, Clin ADNI 2004 Middle
2020 ๐Ÿ”— ๐Ÿ”— Pancreas CyTOF, m-IHC, scRNA-seq In-House 18+105+19 N/A
[Feb. 2024] Multimodal Healthcare AI: Identifying and Designing Clinically Relevant Vision-Language Applications for Radiology, arXiv

Paper

  • Cancer: Non-Cancer
  • Modalities: Radiological Images
  • Data Source: In-House
  • Contribution: The first inverstigation into the potential utility and design requirements for leveraging vision-language model (VLM) capabilities with 13 radiologists and clinicians in the context of radiology of four tasks: Draft Report Generation, Augmented Report Review, Visual Search and Querying, and Patient Imaging History Highlights.
[Jun. 2023] Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification, arXiv

Paper Code

  • Cancer: Solitary Pulmonary Nodule (SPN)
  • Modalities: Radiological Images (chest CTs), Clinical Data (EHR)
  • Data Source: NLST, EHR-Pulmonary (the unlabeled dataset used to learn clinical signatures in an unsupervised manner), Image-EHR (a labeled dataset with paired imaging and EHRs), In-House dataset
  • Patients: Our classifier is pretrained on 2,668 scans from a public dataset and 1,149 subjects with longitudinal chest CTs, billing codes, medications, and laboratory tests from EHRs of our home institution.
  • Pipeline:
    • learning independent latent signatures in an unsupervised manner on a large non-imaging cohort (non-imaging features)
    • extracting longitudinal deep image features from CTs via a CNN (imaging features)
    • token embedding is derived from signatures (non-imaging features) and imaging (imaging features); a fixed positional embedding indicating the token's position in the sequence; a learnable segment embedding indicating imaging or non-imaging modality
    • a self-attention is used to integrate multimodal and longitudinal features
  • Fusion Mode: Middle-fusion, using self-attention to integrate multimodal features
[Jun. 2023] A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics (IRENE), Nature Biomedical Engineering

Paper Code

  • Cancer: Non-Cancer, predicting the adverse clinical outcomes in patients with COVID-19
  • Modalities: Chest X-rays, Unstructured Text (i.e. chief complaint, history of present and past illness, and a complete laboratory test report), Structured Text (i.e. demographics)
  • Data Source: In-house dataset from West China Hospital
  • Patients: 51511 patients with 72283 data samples
    • 44628 patients for training and 3325 patients for testing
  • Pipeline:
    • tokenizing unstructured text into tokens
    • mapping structured text into tokens via linear projection
    • tokenizing images into tokens
    • using the proposed bidirectional multimodal attention block followed by some self-attention block for multimodal fusion
    • a classification head for predicting disease
  • Fusion Mode: Middle-fusion
[Feb. 2023] Development and evaluation of an integrated model based on a deep segmentation network and demography-added radiomics algorithm for segmentation and diagnosis of early lung adenocarcinoma, Computerized Medical Imaging and Graphics

Paper

  • Cancer: Lung Adenocarcinoma
  • Modalities: Radiological Images (CT), Clinical Data
  • Data Source: In-House
  • Patients: A total of 199 GGN cases, consisting of 168 GGN cases for developing the model and the rest of 31 independent cases for validation
  • Pipeline:
    • first, a deep segmentation model is utilized to locate GGNs in CTs and to help categorizing the lesions with a classification model to be subsequently applied
    • then, extracting 1690 quantitative image features via Pyradiomics from lesions, and 28 features from the settings of CTs (i.e., device and modality settings), patients' general characteristics (i.e., age, sex, smoking status), and references were added
    • reducing and selecting the features
    • using a classifier to make prediction
  • Fusion Mode: Middle-fusion, concatenating CT radiomics features and clinical data and the settings of CTs within feature space
[Dec. 2022] Medical Diagnosis with Large Scale Multimodal Transformers: Leveraging Diverse Data for More Accurate Diagnosis, arXiv

Paper Code

  • Cancer: Non-Cancer, focus on intensive care and ophthalmology walk-ins
  • Modalities: Radiological Images (chest radiographs, fundoscopy images), Clinical Data
  • Data Source: MIMIC dataset
  • Patients: MIMIC database comprises retrospectively collected image and non-image data of over 40,000 patients admitted to an intensive care unit or the emergency department at the Beth Israel Deaconess Medical Center between 2008 and 2019.
    • The authors follow the previous work and extract imaging and non-imaging information from the MIMIC-IV and MIMIC-CXR-JPG database resulting in a subset of 45,676 samples from n=36,542 patients
    • The internal dataset of chest radiographs consisting of 193,556 samples (n=45,016 patients) is thus split into a training set of 122,294 samples (n=28,809 patients), validation set of 31,243 samples (n=7,203 patients) and a test set of 40,028 samples (n=9,004 patients).
    • The the fundoscopy dataset comprised of 3,860 samples (n=1,930 patients) is split into training set of 2,586 samples (n=1,293 patients), a validation set of 502 samples (n=251 patients) and a test set of 772 samples (n=386 patients).
  • Pipeline:
    • using a transformer encoder (similar to ViT) to tokenize and encode imaging data into visual tokens (imaging features)
    • using learnable tokens to as query, meanwhile clinical parameters as the key and value, and employing the cross-attention to extract clinical information from clinical parameters into learnable tokens (non-imaging features)
    • the output learnbale tokens and the visual tokens are passed through the transformer encoder, and then the class token is used to make prediction via a MLP
  • Fusion Mode: Middle-fusion, using a transformer encoder to integrate imaging and non-imaging features
[Sep. 2022] mmFormer: Multimodal Medical Transformer for Incomplete Multimodal Learning of Brain Tumor Segmentation, MICCAI

Paper Code

  • Cancer: Brain Tumor
  • Modalities: MRIs (FLAIR, T1c, T1, T2)
  • Data Source: BraTS 2018
  • Patients: 285 multi-contrast MRI scans
  • Pipeline:
    • using modality-specific encoders to extract modelity-specific features within each modality
    • employing an inter-modal transformer to build and align the long-range correlations across modalities
    • a decoder performs a progressive up-sampling and fusion with the modality-invariant features to generate robust segmentation
  • Fusion Mode: Middle-fusion, using an inter-modal transformer to integrate multimodal features
[Mar. 2021] Relation-Induced Multi-Modal Shared Representation Learning for Alzheimerโ€™s Disease Diagnosis, TMI

Paper

  • Cancer: Non-Cancer, predicting Alzheimer's disease diagnosis
  • Modalities: Radiological Images (MRIs, PETs)
  • Data Source: ADNI
  • Patients: A total of 820 patients, consisting of 93 AD, 99 NC, 121 sMCI, and 79 pMCI from ADNI-1 and 136 AD, 107 NC, 103 sMCI, and 82 pMCI from ADNI-2.
  • Pipeline:
    • learning a bi-directional mapping (including projection matrix P and reconstruction matrix Q) to obtain the shared representation matrix U between original space and shared space
    • within this shared space, utilizing several relational regularizers (including feature-feature, feature-label, and sample-sample regularizers) as auxiliary regularizers to encourage learning underlying associations inherent in multi-modal data and alleviate overfitting
    • predict the shared representations into the target space for AD diagnosis
  • Fusion Mode: Middle-fusion, learning a shared-representation across different modalities
[Feb. 2021] Multimodal deep learning models for early detection of Alzheimerโ€™s disease stage, Scientific Reports

Paper

  • Cancer: Non-Cancer, early detection of Alzheimer's disease stage
  • Modalities: Radiological Images (MRI), Gene Data (single nucleotide polymorphisms (SNPs)), Clinical Data
  • Data Source: ADNI dataset
  • Patients: ADNI dataset contains SNP (808 patients), MRI imaging (503 patients), and clinical and neurological test data (2004 patients)
  • Pipeline:
    • using stacked denoising auto-encoders to extract faetures from clinical and genetic data
    • using 3D0CNNs for imaging data
    • developing a novel data interpretation method to identify top-performing features learned by the deep-models with clustering and perturbation analysis
  • Fusion Mode: Middle-fusion, concatenating multimodal features and then using a classification layer for prediction
[Nov. 2020] Multimodal mapping of the tumor and peripheral blood immune landscape in human pancreatic cancer, Nature cancer

Paper Code

  • Cancer: Pancreatic cancer
  • Modalities: CyTOF, single-cell RNA sequencing, and multiplex immunohistochemistry
  • Data Source: In-house
  • Patients:
    • CyTOF: 10 PDA samples / 8 control samples
    • m-IHCs: 71 PDA and 34 chronic pancreatitis samples
    • single-cell RNA-sequencing: 16 PDA samples / 3 control samples (in total, we sequenced 8,541 cells from adjacent/normal samples and 46,244 cells from PDA, while from the blood samples we sequenced 14,240 cells from four healthy subjects and 55,873 cells from 16 patients with PDA.)
  • Pipeline: Performing biological analysis for different modalities respectively
  • Fusion Mode: N/A

Related Reviews

[Apr. 2023] Deep multimodal fusion of image and non-image data in disease diagnosis and prognosis: a review, Progress in Biomedical Engineering

Paper

Content:

  • Data Modalities: Image data (pathology images, radiology images, camera images); Non-image data (structured data, free-text data)
  • Multimodal fusion methods: Operation-based; Subspace-based; Attention-based; Tensor-based; Graph-based

View points:

  • It is difficult to compare the performance of different methods directly, since different studies were typically done on different datasets with different settings.
  • There is no clue that a fusion method always performance the best. The optimal fusion method might be task/data dependent.
  • Fusing multi-modal data typically surpassed the uni-modal counterparts in the downstream tasks, but on the other hand, some studies also mentioned that the model that fused more modalities may not always perform better than the ones with fewer modalities (I think the reason is not doing a good modal fusion)
  • Deep-learning methods require a large amount of training data, however, data scaricity, especially multimodal data, is a challenge in the healthcare are.
  • Unimodal feature extraction is a essential prerequisite for fusion, especially for multimodal heterogeneity.
  • Explainability is a challenge in multimodal diagnosis and prognosis.
โญ๏ธ [Oct. 2022] Artificial intelligence for multimodal data integration in oncology, Cancer Cell

Paper

Content:

  • AI methods in oncology
    • Supervised methods
      • Hand-crafted methods
        • ๐Ÿ‘๏ผšsimpler architecture, lower computation cost, may require less training data, and better interpretability
        • ๐Ÿ‘Ž๏ผštime consuming, translate human bias to the models
      • Representation learning methods
        • ๐Ÿ‘๏ผštheir ability to extract rich feature representations from raw data, resulting in lower preprocessing cost, higher flexibility, and often superior performance over hand-crafted methods
        • ๐Ÿ‘Ž๏ผšreliance on pixel-level annotations, lack of interpretability
    • Weakly supervised methods: this method can reduce the cost of data preprocessing and mitigate the bias and interrater variability; additionally, they are free to learn from the entire scan, that can indentify predictive features even beyond the regions typically evaluated by clinicians.
      • Graph convolutional networks
        • ๐Ÿ‘๏ผšcan incorporate larger context and spatial tissue structure
        • ๐Ÿ‘Ž๏ผšhigher training costs and memory requirements (since the nodes cannot be processed independently)
      • Multiple-instance learning
        • ๐Ÿ‘๏ผšno fine annotation is required
        • ๐Ÿ‘Ž๏ผšoverlook patches' correlation
      • Vision transformers
        • ๐Ÿ‘๏ผšbe fully context aware, consider patches' correlation and context, consider spatial structure or relative distances between patches via positional encoding
        • ๐Ÿ‘Ž: tend to be more data hungry
    • Unsupervised methods
      • Self-supervised methods
        • ๐Ÿ‘๏ผšcan learn general-purpose features, which can be beneficial for other practical tasks (transfer learning)
        • ๐Ÿ‘Ž: (Not mentioned in the paper)
      • Unsupervised feature analysis
        • ๐Ÿ‘๏ผšcan explore structure, similarity and common features across data points
        • ๐Ÿ‘Ž: (Not mentioned in the paper)
  • Multimodal data fusion
    • Early fusion
      • ๐Ÿ‘๏ผšonly one model is trainied, simplifing the design process
      • ๐Ÿ‘Ž: requires a certain level of alignment or synchronization between the modalities
    • Late fusion (decision-level fusion)
      • ๐Ÿ‘๏ผšallows one to use a different model achitecture for each modality, making it suitable for systems with large data heterogeneity or modalities from different time points; be able to cope with missing or incomplete data; suitable for weak interdependencies
      • ๐Ÿ‘Ž: unsuitable for strong interdependencies
    • Intermediate fusion
      • ๐Ÿ‘๏ผšflexibleโ€”single-level fusion, gradual fusion, guided fusion
    • There is no conclusive evidence that one fusion type is ultimately better than the others, as each type is heavily data and task specific.
  • Multimodal interpretability
    • Histopathology: map model architecture attention or probability scores to obtain slide-level attention heatmaps
    • Radiology: is similar to those used in histoloty
    • Molecular data: use the integrated gradient method to analyze, which computes attribution values indicating how changes in specific inputs affect the model outputs
    • Multimodal models: all previously mentioned methods can be used in multimodal models to explore interpretability within each modality. Moreover, shifts in feature importance under unimodal and multimodal settings can be investigated to analyze the impact of the multimodal context.
    • While CAM- or attention-based methods can localize the predictive regions, they cannot specify which features are relevant, i.e., they can explain where but not why.
    • There is no guarantee that all high-attention/attribution regions carry clinical relevance. High scores just mean that the model has considered these regions more important than others.
  • Multimodal data interconnection
    • Morphologic associations
    • Non-invasive alternatives
    • Outcome associations
    • Early predictors
  • Challenges
    • Missing data
      • Synthetic data generation
      • Dropout-based methods
    • Data alignment
      • Alignment of similar modalities (e.g. MRI and PET brain scans)
      • Alignment of diverse modalities (e.g. data from different scales, timepoints, or measurements)
    • Transparency and prospective clinical trials
โญ๏ธ [Sep. 2022] Multimodal biomedical AI, Nature Medicine

Paper

Content & View points:

  • Opportunities for leveraging multimodal data (applications)
    • Personalized 'omics' for precision health
    • Digital clinical trials
    • Remote monitoring: the 'hospital-at-home'
    • Pandemic surveillance and outbreak detection
    • Digital twins
    • Virtual health assistant
  • Multimodal data collection
Study Country Year started Data modalities Access Sample size
UK Biobank UK 2006 Questionnaires, EHR/clinical, Laboratory, Genome-wide genotyping, WES, WGS, Imaging, Metabolites Open access ~500,000
China Kadoorie Biobank China 2004 Questionnaires, Physical measurements, Biosamples, Genome-wide genotyping Restricted access ~500,000
Biobank Japan Japan 2003 Questionnaires, Clinical, Laboratory, Genome-wide genotyping Restricted access ~200,000
Million Veteran Program USA 2011 EHR/clinical, Laboratory, Genome wide Restricted access 1 million
TOPMed USA 2014 Clinical, WGS Open access ~180,000
All of Us Research Program USA 2017 Questionnaires, SDH, EHR/clinical, Laboratory, Genome wide, Wearables Open access 1 million (target)
Project Baseline Health Study USA 2015 Questionnaires, EHR/clinical, Laboratory, Wearables Restricted access 10,000 (target)
American Gut Project USA 2012 Clinical, Diet, Microbiome Open access ~25,000
MIMIC USA 2008-2019 Clinical/EHR, Images Open access ~380,000
MIPACT USA 2018-2019 Wearables, clinical/EHR, physiological, laboratory Restricted access ~6,000
North American Prodrome Longitudinal Study USA 2008 Clinical, Genetic Restricted access ~1,000
  • Technical challenges
    • How to leverage multiple different types of data and learn to relate these multiple modalities or combine them for improving prediction performance?
    • Another desirable feature for multimodal learning frameworks is the ability to learn from different modalities without the need for different model architectures.
    • Another important modeling challenge relates to the exceedingly high number of dimensions contained in multimodal health data, collectively termed โ€˜the curse of dimensionalityโ€™.
    • Multimodal fusion is a general concept that can be tackled using any architectural choice.
    • Many other important challenges relating to multimodal model architectures remain (for example, how to extract features from three-dimensional imaging or whole-slide images)
  • Data challenges
    • Medical datasets are heterogeneous, which can be described along several axes, including the sample size, depth of phenotyping, the length and intervals of follow-up, the degree of interaction between participants, the heterogeneity and diversity of the participants, the level of standardization and harmonization of the data and the amount of linkage between data sources.
    • Achieving diversity across race/ethnicity, ancestry, income level, education level, healthcare access, age, disability status, geographic locations, gender and sexual orientation has proven difficult in practice.
    • Another frequent problem with biomedical data is the usually high proportion of missing data.
    • The risk of incurring several biases is important when conducting studies that collect health data, and multiple approaches are necessary to monitor and mitigate these biases.
  • Privacy challenges
    • The successful development of multimodal AI in health requires breadth and depth of data, which encompasses higher privacy challenges than single-modality AI models.
,

Paper Code

  • Cancer:
  • Modalities:
  • Data Source:
  • Patients:
  • Pipeline:
  • Fusion Mode:
  • Multimodal Analysis of Composition and Spatial Architecture in Human Squamous Cell Carcinoma
  • Multimodal analysis of cell-free DNA wholegenome sequencing for pediatric cancers with low mutational burden

About


Languages

Language:Shell 100.0%