Bridging visual modalities and natural language is a interesting yet challenging task. It attracts more and more research highlights and requires interdisciplinary efforts from Computer Vision, Natural Language Processing and Machine Learning.
This repository contains recent papers, projects and materials on Image Captioning, Text-Image Matching and Text-to-Image Generation.
VIsual TRAnslator: Linking perceptions and natural language descriptions PDF
Learning visually grounded words and syntax for a scene description task PDF
Every picture tells a story: Generating sentences from images PDF
Babytalk: Understanding and generating simple image descriptions PDF
Show and Tell: A Neural Image Caption Generator (CVPR2015) PDF
Deep Visual-Semantic Alignments for Generating Image Descriptions (CVPR2015) PDF code site
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (ICML2015) PDF code site
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks (NIPS2015) PDF
Areas of Attention for Image Captioning (ICCV2017) PDF
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning (CVPR2017) PDF code
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning (CVPR2017) PDF code
Self-critical Sequence Training for Image Captioning (CVPR2017) PDF
Stack-Captioning: Coarse-to-Fine Learning for Image Captioning (AAAI2018) PDF code
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering (CVPR2018) PDF code
Convolutional Image Captioning (CVPR2018) PDF code
Rethinking the Form of Latent States in Image Captioning (ECCV2018) PDF code
Recurrent Fusion Network for Image Captioning (ECCV2018) PDF
pytorch-tutorial/image_captioning
ruotianluo/ImageCaptioning.pytorch
alecwangcq/show-attend-and-tell
sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning
Deep Visual-Semantic Alignments for Generating Image Descriptions
Cross-modal Retrieval with Correspondence Autoencoder (ACMMM2014) PDF
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models (arXiv 2014) PDF
Multimodal Convolutional Neural Networks for Matching Image and Sentence (ICCV2015) PDF
Identity-Aware Textual-Visual Matching with Latent Co-attention (ICCV2017) PDF
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM (CVPR2017) PDF
Deep Cross-Modal Projection Learning for Image-Text Matching (ECCV2018) PDF
End-to-end cross-modality retrieval with CCA projections and pairwise ranking loss (JMIR2018) PDF
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models (CVPR2018) PDF
Generating Images From Captions with Attention (ICLR2016) PDF code
Learning What and Where to Draw (NIPS2016) PDF code
Generative Adversarial Text to Image Synthesis (ICML2016) PDF code
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks (ICCV2017) PDF code
ChatPainter: Improving Text to Image Generation using Dialogue (arXiv 2018) PDF
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks (CVPR2018) PDF Code code
Text2Scene: Generating Abstract Scenes from Textual Descriptions (arXiv2018) PDF



























