The Final Goal is to classify pathology images, with the help of text descriptions to improve performance. The projects are still in progress.
-
Use PdfMiner and PyPdf2 to extract text and images from pdf files Here
-
NLP analysis: Implement text mining tutorials -- preprocessing and word2vec in Gensim and NLTK Here
-
Img2Vec: Use pre-trained models (or do none pre-trained) in PyTorch to extract vector embeddings from any image and calculate their similarity. Here
Images
: ChestXray-NIHCC
Reports
: [OpenI](https: //openi.nlm.nih.gov)(3,643 unique front view images and corresponding reports are selected)
Paper Used that data:
TieNet_CVPR2018, ChestX-ray8_CVPR2017spotlight
Paper Used that data:
-
DEEP ATTENTIVE FEATURE LEARNING FOR HISTOPATHOLOGY IMAGE CLASSIFICATION
-
A Dataset for Breast Cancer Histopathological Image Classification (present an evaluation of different combinations of six different visual feature descriptors along with different classifiers.)
-
Deep Features for Breast Cancer Histopathological Image Classification
-
Breast Cancer Histopathological Image Classification using Convolutional Neural Networks_IJCNN2016 -->present results from a CNN for this dataset. Given that CNNs generally require large datasets, they make use of the
random-patches trick
. With this approach, the results increases in about 4 to 6 percentage points in the accuracy. -
Deep Learning for Magnification Independent Breast Cancer Histopathology Image Classification
-
Reports
: MIMIC -
Images
:
The original H&E stained whole-slide images used in this work can be downloaded from the Genomic Data Commons. All TCGA molecular data can be obtained from the Genomic Data Commons, as well as derived data matrices of the PanCancer Atlas. Integration with immune signatures of the TCGA immune response working group is available through CRI iAtlas web resource. Links to these data resources can be found at the accompanying publication manuscript page (https://gdc.cancer.gov/about-data/publications/tilmap).
These different software resources as well as the tumor-infiltrating lymphocytes(TIL) maps are available on the Cancer Imaging Archive, at: https://doi.org/10.7937/K9/TCIA.2018.Y75F9W1
Paper Used that data:
Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images
This paper, along with [TieNet] is written by a same team.
-
Explainable Prediction of Medical Codes from Clinical Text MyNotes
-
Deep Visual-Semantic Alignments for Generating Image Descriptions(2015CVPR)
- The following 3 papers are from the same team:
MyNotes -->present an evaluation of different combinations of six different visual feature descriptors along with different classifiers.
MyNotes-->present an evaluation of DeCaf
features for Breast Cancer recognition (image classification).
Results: The main observation is that the use of DeCAF features can generally achieve better results than 1. the use of more traditional visual feature descriptors, and 2. outperforming task-specific CNNs in some cases.
- Breast Cancer Histopathological Image Classification using Convolutional Neural Networks_IJCNN2016 -->present results from a CNN for this dataset. Given that CNNs generally require large datasets, they make use of the
random-patches trick
, which consists of extracting sub-images at both training and test phases. During training, the idea is to increase the training set by means of extracting patches at randomly-defined positions. And during test, patches are extracted from a grid, and after classifying each patch, their classification results are combined. The authors show that, with this approach,increases in about 4 to 6 percentage points can be observed in the accuracy
.
- Privileged information
Deep Learning Under Privileged Information Using Heteroscedastic Dropout_CVPR2018
we propose to utilize privileged information in order to control the variance of the Dropout. Since the Dropout’s variance is not constant, we call this a Heteroscedastic Dropout. Our empirical and theoretical analysis suggests that Heteroscedastic Dropout significantly increases the sample efficiency of both CNNs and RNNs, resulting in higher accuracy with much less data.
Heteroscedastic dropout - extend the LUPI from SVM-based methods to CNN/RNN-based methods Utilize additional information during the training, in order to control the variance of the Dropout. Since the Dropout’s variance is not constant, it’s called Heteroscedastic Dropout.
Datasets - Pretrained CNN/RNN Models
Learning to Rank Using Privileged Information
- NLP for pathology
Natural language processing in pathology: a scoping review_2016 Reviewed and summarized the study objectives; NLP methods used and their validation(word/phrase matching, probabilistic machine learning and rule-based systems); software implementations; the performance on the dataset used and any reported use in practice. a publishing date extending to Sep. 2014
little work has been done on breast pathology reports, given the high incidence of breast cancer.
-
Deep Learning for Magnification Independent Breast Cancer Histopathology Image Classification --> proposed a method to classify the BC histopathology images, which is independent of the magnifications factors. Their experimental results are competitive with previous state-of-the-art results obtained from hand-crafted features.
-
vision-related work:
Show, attend and tell: Neural image caption generation with visual attention_ICML2015 Multi-level attention networks for visual question answering_CVPR2017
Pointer networks_NIPS2015Spotlight
Areas of attention for image captioning_ICCV2017
Dual attention networks for multimodal reasoning and matching_CVPR2017 ------ ChineseBlog
- Medical-image-related:
MDNet: a semantically and visually interpretable medical image diagnosis network_CVPR2017
- NLP-related work:
Neural machine translation by jointly learning to align and translate_ICLR2015
Encoding source language with convolutional neural network for machine translation_ACL-CoNLL2015
A neural attention model for abstractive sentence summarization_EMNLP2015
Not all contexts are created equal: Better word representations with variable attention_EMNLP2015
A structured self-attentive sentence embedding_ICLR2017
Learning natural language inference using bidirectional LSTM model and inner-attention_2016