deyachatterjee / Image-NLP-classification

Pathology Image Classification Assisting with NLP Classifier

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pathology Image Classification Assisting with NLP Classifier

The Final Goal is to classify pathology images, with the help of text descriptions to improve performance. The projects are still in progress.

The work has been done

  1. Use PdfMiner and PyPdf2 to extract text and images from pdf files Here

  2. NLP analysis: Implement text mining tutorials -- preprocessing and word2vec in Gensim and NLTK Here

  3. Img2Vec: Use pre-trained models (or do none pre-trained) in PyTorch to extract vector embeddings from any image and calculate their similarity. Here

Part of the Reading list:

Datasets

  1. Images: ChestXray-NIHCC

Descriptions

Reports: [OpenI](https: //openi.nlm.nih.gov)(3,643 unique front view images and corresponding reports are selected) Paper Used that data:

TieNet_CVPR2018, ChestX-ray8_CVPR2017spotlight

  1. Images: Breast Cancer Histopathological Database (BreakHis)

Paper Used that data:

  1. Reports: MIMIC

  2. Images:

The original H&E stained whole-slide images used in this work can be downloaded from the Genomic Data Commons. All TCGA molecular data can be obtained from the Genomic Data Commons, as well as derived data matrices of the PanCancer Atlas. Integration with immune signatures of the TCGA immune response working group is available through CRI iAtlas web resource. Links to these data resources can be found at the accompanying publication manuscript page (https://gdc.cancer.gov/about-data/publications/tilmap).

These different software resources as well as the tumor-infiltrating lymphocytes(TIL) maps are available on the Cancer Imaging Archive, at: https://doi.org/10.7937/K9/TCIA.2018.Y75F9W1

Paper Used that data:

Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images

Papers

  1. TieNet:Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays_CVPR2018

MyNotes

ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases_CVPR2017spotlight

This paper, along with [TieNet] is written by a same team.

  1. Explainable Prediction of Medical Codes from Clinical Text MyNotes

  2. Deep Visual-Semantic Alignments for Generating Image Descriptions(2015CVPR)

MyNotes

  1. The following 3 papers are from the same team:

MyNotes -->present an evaluation of different combinations of six different visual feature descriptors along with different classifiers.

MyNotes-->present an evaluation of DeCaf features for Breast Cancer recognition (image classification).

Results: The main observation is that the use of DeCAF features can generally achieve better results than 1. the use of more traditional visual feature descriptors, and 2. outperforming task-specific CNNs in some cases.

  • Breast Cancer Histopathological Image Classification using Convolutional Neural Networks_IJCNN2016 -->present results from a CNN for this dataset. Given that CNNs generally require large datasets, they make use of the random-patches trick, which consists of extracting sub-images at both training and test phases. During training, the idea is to increase the training set by means of extracting patches at randomly-defined positions. And during test, patches are extracted from a grid, and after classifying each patch, their classification results are combined. The authors show that, with this approach, increases in about 4 to 6 percentage points can be observed in the accuracy.
  1. Privileged information

Deep Learning Under Privileged Information Using Heteroscedastic Dropout_CVPR2018

we propose to utilize privileged information in order to control the variance of the Dropout. Since the Dropout’s variance is not constant, we call this a Heteroscedastic Dropout. Our empirical and theoretical analysis suggests that Heteroscedastic Dropout significantly increases the sample efficiency of both CNNs and RNNs, resulting in higher accuracy with much less data.

Heteroscedastic dropout - extend the LUPI from SVM-based methods to CNN/RNN-based methods Utilize additional information during the training, in order to control the variance of the Dropout. Since the Dropout’s variance is not constant, it’s called Heteroscedastic Dropout.

Datasets - Pretrained CNN/RNN Models

Learning to Rank Using Privileged Information

  1. NLP for pathology

Natural language processing in pathology: a scoping review_2016 Reviewed and summarized the study objectives; NLP methods used and their validation(word/phrase matching, probabilistic machine learning and rule-based systems); software implementations; the performance on the dataset used and any reported use in practice. a publishing date extending to Sep. 2014

little work has been done on breast pathology reports, given the high incidence of breast cancer.

The feasibility of using natural language processing to extract clinical information from breast pathology reports

  1. Deep Learning for Magnification Independent Breast Cancer Histopathology Image Classification --> proposed a method to classify the BC histopathology images, which is independent of the magnifications factors. Their experimental results are competitive with previous state-of-the-art results obtained from hand-crafted features.

  2. vision-related work:

Show, attend and tell: Neural image caption generation with visual attention_ICML2015 Multi-level attention networks for visual question answering_CVPR2017

Pointer networks_NIPS2015Spotlight

Areas of attention for image captioning_ICCV2017

Dual attention networks for multimodal reasoning and matching_CVPR2017 ------ ChineseBlog

  1. Medical-image-related:

Learning to read chest X-rays: recurrent neural cascade model for automated image annotation_CVPR2016

MDNet: a semantically and visually interpretable medical image diagnosis network_CVPR2017

  1. NLP-related work:

Neural machine translation by jointly learning to align and translate_ICLR2015

Encoding source language with convolutional neural network for machine translation_ACL-CoNLL2015

A neural attention model for abstractive sentence summarization_EMNLP2015

Not all contexts are created equal: Better word representations with variable attention_EMNLP2015

A structured self-attentive sentence embedding_ICLR2017

Learning natural language inference using bidirectional LSTM model and inner-attention_2016

  1. BioNLP2017

  2. BioNLP2018WorkShop

  3. Entity Recognition

  4. identifies key sentences in abstracts of oncological articles to aid evidence based medicine

  5. Extracting heart disease risk factors from clinical documents

  6. PICO abstract Element Detection

  7. Disease Phrase Matching - https://github.com/dhwajraj/

About

Pathology Image Classification Assisting with NLP Classifier


Languages

Language:Python 100.0%