Pathology Image Classification Assisting with NLP Classifier

The Final Goal is to classify pathology images, with the help of text descriptions to improve performance. The projects are still in progress.

The work has been done

Use PdfMiner and PyPdf2 to extract text and images from pdf files Here
NLP analysis: Implement text mining tutorials -- preprocessing and word2vec in Gensim and NLTK Here
Img2Vec: Use pre-trained models (or do none pre-trained) in PyTorch to extract vector embeddings from any image and calculate their similarity. Here

Part of the Reading list:

Datasets

Images: ChestXray-NIHCC

Descriptions

Reports: [OpenI](https: //openi.nlm.nih.gov)(3,643 unique front view images and corresponding reports are selected) Paper Used that data:

TieNet_CVPR2018, ChestX-ray8_CVPR2017spotlight

Images: Breast Cancer Histopathological Database (BreakHis)

Paper Used that data:

DEEP ATTENTIVE FEATURE LEARNING FOR HISTOPATHOLOGY IMAGE CLASSIFICATION
A Dataset for Breast Cancer Histopathological Image Classification (present an evaluation of different combinations of six different visual feature descriptors along with different classifiers.)
Deep Features for Breast Cancer Histopathological Image Classification
Breast Cancer Histopathological Image Classification using Convolutional Neural Networks_IJCNN2016 -->present results from a CNN for this dataset. Given that CNNs generally require large datasets, they make use of the random-patches trick. With this approach, the results increases in about 4 to 6 percentage points in the accuracy.
Deep Learning for Magnification Independent Breast Cancer Histopathology Image Classification

Reports: MIMIC
Images:

The original H&E stained whole-slide images used in this work can be downloaded from the Genomic Data Commons. All TCGA molecular data can be obtained from the Genomic Data Commons, as well as derived data matrices of the PanCancer Atlas. Integration with immune signatures of the TCGA immune response working group is available through CRI iAtlas web resource. Links to these data resources can be found at the accompanying publication manuscript page (https://gdc.cancer.gov/about-data/publications/tilmap).

These different software resources as well as the tumor-infiltrating lymphocytes(TIL) maps are available on the Cancer Imaging Archive, at: https://doi.org/10.7937/K9/TCIA.2018.Y75F9W1

Paper Used that data:

Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images

Papers

TieNet:Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays_CVPR2018

MyNotes

ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases_CVPR2017spotlight

This paper, along with [TieNet] is written by a same team.

MyNotes

The following 3 papers are from the same team:

A Dataset for Breast Cancer Histopathological Image Classification

MyNotes -->present an evaluation of different combinations of six different visual feature descriptors along with different classifiers.

Deep Features for Breast Cancer Histopathological Image Classification

MyNotes-->present an evaluation of DeCaf features for Breast Cancer recognition (image classification).

Results: The main observation is that the use of DeCAF features can generally achieve better results than 1. the use of more traditional visual feature descriptors, and 2. outperforming task-specific CNNs in some cases.

Breast Cancer Histopathological Image Classification using Convolutional Neural Networks_IJCNN2016 -->present results from a CNN for this dataset. Given that CNNs generally require large datasets, they make use of the random-patches trick, which consists of extracting sub-images at both training and test phases. During training, the idea is to increase the training set by means of extracting patches at randomly-defined positions. And during test, patches are extracted from a grid, and after classifying each patch, their classification results are combined. The authors show that, with this approach, increases in about 4 to 6 percentage points can be observed in the accuracy.

Privileged information

Deep Learning Under Privileged Information Using Heteroscedastic Dropout_CVPR2018

we propose to utilize privileged information in order to control the variance of the Dropout. Since the Dropout’s variance is not constant, we call this a Heteroscedastic Dropout. Our empirical and theoretical analysis suggests that Heteroscedastic Dropout significantly increases the sample efficiency of both CNNs and RNNs, resulting in higher accuracy with much less data.

Heteroscedastic dropout - extend the LUPI from SVM-based methods to CNN/RNN-based methods Utilize additional information during the training, in order to control the variance of the Dropout. Since the Dropout’s variance is not constant, it’s called Heteroscedastic Dropout.

Datasets - Pretrained CNN/RNN Models

Learning to Rank Using Privileged Information

NLP for pathology

Natural language processing in pathology: a scoping review_2016 Reviewed and summarized the study objectives; NLP methods used and their validation(word/phrase matching, probabilistic machine learning and rule-based systems); software implementations; the performance on the dataset used and any reported use in practice. a publishing date extending to Sep. 2014

little work has been done on breast pathology reports, given the high incidence of breast cancer.

The feasibility of using natural language processing to extract clinical information from breast pathology reports

Deep Learning for Magnification Independent Breast Cancer Histopathology Image Classification --> proposed a method to classify the BC histopathology images, which is independent of the magnifications factors. Their experimental results are competitive with previous state-of-the-art results obtained from hand-crafted features.
vision-related work:

Show, attend and tell: Neural image caption generation with visual attention_ICML2015 Multi-level attention networks for visual question answering_CVPR2017

Pointer networks_NIPS2015Spotlight

Areas of attention for image captioning_ICCV2017

Dual attention networks for multimodal reasoning and matching_CVPR2017 ------ ChineseBlog