Intro

Combine CV with NLP tasks，focus on Medical Report Generation、Image/Video Captioning、VQA、Anchor-free Object Detection、Weakly Supervised Segmentation.

Image/Video Captioning
Paragraph Description Generation
Visual Question Answering
Medical Report Generation
Medical Image Processing
Object Detection
Segmentation
Weakly Supervised Segmentation
Metrics
Others

Papers and Codes/Notes

Image Video Captioning

CNN-RNN
- Show and Tell: A Neural Image Caption Generator, Oriol Vinyals et al, CVPR 2015, Google(pdf)
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, Kelvin Xu et at, ICML 2015(pdf)(code)
- Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, PAMI 2016(pdf)(code)
- Areas of Attention for Image Captioning, ICCV 2017(pdf)
- Rethinking the Form of Latent States in Image Captioning, ECCV 2018, CUHK(pdf)
- Recurrent Fusion Network for Image Captioning, ECCV 2018, Tencent AI Lab, 复旦(pdf)
- Move Forward and Tell- A Progressive Generator of Video Descriptions, ECCV 2018, CUHK(pdf)
- Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks, CVPR 2016(pdf)
CNN-CNN
- Convolutional Image Captioning, CVPR 2018(pdf)(code)
Reinforcement Learning
- Improving Reinforcement Learning Based Image Captioning with Natural Language Prior, 2018, Tencent/IBM(pdf)
- End-to-End Video Captioning with Multitask Reinforcement Learning(pdf)
Others
- A Neural Compositional Paradigm for Image Captioning, NIPS 2018, CUHK(pdf)

Paragraph Description Generation

CNN-RNN
- DenseCap: Fully Convolutional Localization Networks for Dense Captioning, Justin Johnson et al, CVPR 2016, Standford(homepage)(code)
- A Hierarchical Approach for Generating Descriptive Image Paragraphs, Jonathan Krause et al, CVPR 2017, Stanford(homepage)(dense-caption code)
- Recurrent Topic-Transition GAN for Visual Paragraph Generation, ICCV 2017
- Diverse and Coherent Paragraph Generation from Images, ECCV 2018(code)

Visual Question Answering

CNN-RNN
- Multi-level Attention Networks for Visual Question Answering, CVPR 2017
- Motion-Appearance Co-Memory Networks for Video Question Answering, 2018
- Deep Attention Neural Tensor Network for Visual Question Answering, ECCV 2018, HIT
- Question-Guided Hybrid Convolution for Visual Question Answering, Peng Gao et al, ECCV 2018, CUHK(pdf)

Medical Report Generation

CNN-RNN
- Learning to Read Chest X-Rays- Recurrent Neural Cascade Model for Automated Image Annotation, CVPR 2016(pdf)
- TieNet Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays, Xiaosong Wang et at, CVPR 2018, NIH(pdf)(author's homepage)
- On the Automatic Generation of Medical Imaging Reports, Baoyu Jing et al., ACL 2018, CMU(pdf)(author's homepage)
- Multimodal Recurrent Model with Attention for Automated Radiology Report Generation, Yuan Xue et al., MICCAI 2018, PSU(pdf)
- Attention-Based Abnormal-Aware Fusion Network for Radiology Report Generation, Xiancheng Xie et al., 2019, Fudan University
- Addressing Data Bias Problems for Chest X-ray Image Report Generation, Philipp Harzig et al., 2019, University of Augsburg(pdf)
- Addressing Data Bias Problems for Chest X-ray Image Report Generation, Philipp Harzig et al., 2019(pdf)
Reinforcement Learning
- Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation, Christy Y. Li et al, NIPS 2018, CMU(pdf)(author's homepage)
Knowledge Graph
- Knowledge-Driven Encode, Retrieve, Paraphrase for Medical Image Report Generation, Christy Y. Li et al, AAAI 2019, DU(pdf)
Other
- TextRay Mining Clinical Reports to Gain a Broad Understanding of Chest X-rays, 2018 MICCAI(pdf)
Blogs
- 医学报告生成综述

Medical Image Processing

Common Datasets

NIH Chest X-ray8/14(download link)(kaggle's download link)
- ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases, CVPR 2017, NIH(pdf)
Open-i Chest X-Ray(download link)
Radiology Objects in COntext(ROCO)
- Radiology Objects in COntext (ROCO): A Multimodal Image Dataset, MICCAI 2018(intro)(pdf)(download)

Medical Tasks

Detection
- CheXNet- Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning, 2018 吴恩达
- Attention-Guided Curriculum Learning for Weakly Supervised Classification and Localization of Thoracic Diseases on Chest Radiographs, Yuxing Tang et at, MICCAI-MLMI oral 2018, NIH(pdf)
- DeepRadiologyNet - Radiologist Level Pathology Detection in CT Head Images
- 肺部CT图像病变区域检测方法
- 基于定量影像组学的肺肿瘤良恶性预测方法
Enhance
- Super Resolution
  - Image Super-Resolution Using Deep Convolutional Networks
  - Deeply-Recursive Convolutional Network for Image Super-Resolution
Segmentation
- U-Net: Convolutional Networks for Biomedical Image Segmentation, 2015 MICCAI
- A 3D Coarse-to-Fine Framework for Automatic Pancreas Segmentation

Object-Detection

Weakly-supervised
- Learning Deep Features for Discriminative Localization, Bolei Zhou et al, CVPR 2016, MIT(pdf)(code)(note)
Anchor-based
- SSD: Single Shot MultiBox Detector, Wei Liu et al, ECCV 2016, UNC Chapel Hill(pdf)(code)(blog)
- YOLO9000- Better, Faster, Stronger, Joseph Redmon et al, CVPR 2017(pdf)(project)(code)
- FPN, Feature Pyramid Networks for Object Detection, Tsung-Yi Lin et al., CVPR 2017, FAIR(pdf)(blog)
Anchor-free
- YOLO, You Only Look Once- Unified, Real-Time Object Detection, Joseph Redmon et al, CVPR 2016(pdf)(note)
- CornerNet, CornerNet: Detecting Objects as Paired Keypoints, Hei Law et al, ECCV 2018, Michigan University(pdf)(code)(blog)
- FCOS, FCOS: Fully Convolutional One-Stage Object Detection, Zhi Tian et al, ICCV 2019, Adelaide University(pdf)(code)(blog)
- CenterNet, Objects as Points, Xingyi Zhou et al, 2019, UT Austin(pdf)(code)
Others
- Bag of Freebies for Training Object Detection Neural Networks, Zhi Zhang et al, 2019, Amazon 李沐(pdf)
- Deformable Convolutional Networks, Jifeng Dai et al, ICCV 2017, Microsoft Research Asia(pdf)(code)

Segmentation

Semantic Segmentation
- PSPNet, Pyramid Scene Parsing Network, Hengshuang Zhao et al., CVPR 2017, CUHK(pdf)(code)
Instance Segmentation
- Mask R-CNN, Kaiming He et al, ICCV 2017(Best Paper), Facebook AI Research (FAIR)(pdf)(code)

Weakly Supervised Segmentation

Bounding Box Supervision
- Weakly- and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation, Liang-Chieh Chen et al., ICCV 2015, UCLA(pdf)(deeplab-v1-code)(model)(note)
- BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation, Jifeng Dai et al., ICCV 2015, Microsoft Research(pdf)
- Simple Does It: Weakly Supervised Instance and Semantic Segmentation, Anna Khoreva et al., CVPR 2017, Max Planck Institute for Informatics(pdf)(code)(tf-code)
- Box-driven Class-wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation, Chunfeng Song et al, CVPR 2019, CASIA(pdf)
Image Label Supervision
- FULLY CONVOLUTIONAL MULTI-CLASS MULTIPLE INSTANCE LEARNING, Deepak Pathak et al., ICLR 2015, UC Berkeley(pdf)(note)
- From Image-level to Pixel-level Labeling with Convolutional Networks, Pedro O. Pinheiro et.al., CVPR 2015, Idiap Research Institute, Martigny(pdf)(note)
- DSRG, Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing, Zilong Huang et al., CVPR 2018, HUST(pdf)(code)
- SSENet, Self-supervised Scale Equivariant Network for Weakly Supervised Semantic Segmentation, Yude Wang et al., 2019, CAS(pdf)(code)
Others
- DenseCRF, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, Philipp Krahenbuhl et al., NIPS 2011, Stanford University(pdf)(homepage)(code)
- A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains, Lyndon Chan et al., 2019(pdf)
Good References
- JackieZhangdx's WeakSupervisedSegmentationList

Metrics

BLEU
- BLEU: a method for automatic evaluation of machine translation, Kishore Papineni et al, ACL 2002(pdf)
CIDEr
- CIDEr: Consensus-based Image Description Evaluation, CVPR 2015(pdf)(note)

Others

Visual Commonsense Reasoning(VCR-视觉常识推理)
- From Recognition to Cognition- Visual Commonsense Reasoning, Rowan Zeller et al, 2018, Paul G. Allen School(homepage)(pdf)
Language Model(语言模型)
- Transformer：Attention Is All You Need, Ashish Vaswani et al, NIPS 2017, Google Brain/Research(pdf)(code)(blog)
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin et al, 2018, Googel AI Language(pdf)(code)(slides)
- ELMo：Deep contextualized word representations, Matthew E. Peters et al, NAACL 2018, Paul G. Allen School(homepage)(pdf)(code-tf)
Teacher Forcing Policy
- A learning algorithm for continually running fully recurrent neural networks, Ronald et al, Neural Computation 1989(pdf)(node)
- Professor Forcing: A New Algorithm for Training Recurrent Networks, Alex Lamb et al, NIPS 2016(pdf)
classification
- VGG, Very Deep Convolutional NetWorks for Large-Scale Image Recognition, Karen Simonyan et at., ICLR 2015(pdf)
- Inception, Going Deeper with Convolutions, Christian Szegedy et al, CVPR 2015, Google(pdf)
- ResNet, Deep Residual Learning for Image Recognition, Kaiming He et al, CVPR 2016, Microsoft Research(pdf)(code)(blog)
- SENet：Squeeze-and-Excitation Networks, Jie Hu et al, CVPR 2018, Momenta(**无人驾驶公司) and Oxford University(pdf)(code)(blog)

wangleihitcs / Papers

Intro

Papers and Codes/Notes

Image Video Captioning

Paragraph Description Generation

Visual Question Answering

Medical Report Generation

Medical Image Processing

Common Datasets

Medical Tasks

Object-Detection

Segmentation

Weakly Supervised Segmentation

Metrics

Others

About