Image-Text-Papers

Image Caption and Image Generation related papers.

Still working on it ....

Image Caption (Image --> Text)

Bernardi, Raffaella, et al. Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures. J. Artif. Intell. Res.(JAIR) 55 (2016): 409-442. [pdf]
Karpathy, Andrej. CONNECTING IMAGES AND NATURAL LANGUAGE. Diss. STANFORD UNIVERSITY, 2016. [pdf]

Kiros R, Salakhutdinov R, Zemel R S. Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539, 2014. [pdf]
Karpathy A, Fei-Fei L. Deep visual-semantic alignments for generating image descriptions. CVPR, 2015: 3128-3137. [pdf]

Vinyals, Oriol, et al. Show and tell: A neural image caption generator. CVPR, 2015. [pdf]
Xu, Kelvin, et al. Show, attend and tell: Neural image caption generation with visual attention. ICML, 2015. [pdf]
Karpathy, Andrej, and Li Fei-Fei. Deep visual-semantic alignments for generating image descriptions. CVPR, 2015. [pdf]
Anderson, Peter, et al. Bottom-up and top-down attention for image captioning and VQA. arXiv preprint arXiv:1707.07998 (2017). [pdf]

Rennie, Steven J., et al. Self-critical Sequence Training for Image Captioning. CVPR, 2017. [pdf]
Liu, Siqi, et al. Improved Image Captioning via Policy Gradient optimization of SPIDEr. ICCV, 2017. [pdf] [video]
Zhou Ren, Xiaoyu Wang, Ning Zhang, et al. Deep Reinforcement Learning-based Image Captioning with Embedding Reward. CVPR, 2017. [pdf] [video]
Chen T H, Liao Y H, Chuang C Y, et al. Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner[C]. ICCV, 2017. [pdf] [Supplementary]
Dai B, Lin D, Urtasun R, et al. Towards diverse and natural image descriptions via a conditional gan. ICCV, 2017. [pdf] [video]

Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759 (2016). [pdf]
Zhang H, Xu T, Li H, et al. Gregor, Karol, et al. DRAW: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623 (2015). [pdf]
Mansimov, Elman, et al. Generating images from captions with attention. arXiv preprint arXiv:1511.02793 (2015). [pdf]

Gauthier, Jon. Conditional generative adversarial nets for convolutional face generation. Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester 2014.5 (2014): 2. [pdf]
Reed S, Akata Z, Yan X, et al. Generative adversarial text to image synthesis. ICML, 2016. [pdf] [Supplementary]
Reed, Scott E., et al. Learning what and where to draw. NIPS, 2016. [pdf]
Zhang H, Xu T, Li H, et al. StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. ICCV, 2017. [pdf] [video]
Zhang H, Xu T, Li H, et al. StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks. arXiv preprint arXiv:1710.10916, 2017. [pdf]
Xu, Tao, et al. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. arXiv preprint arXiv:1711.10485 (2017). [pdf]
Hao Dong, Simiao Yu, Chao Wu, Yike Guo. Semantic Image Synthesis via Adversarial Learning. ICCV, 2017. [pdf] [Supplementary]
Ayushman, John, et al. TAC-GAN - Text Conditioned Auxiliary Classifier Generative Adversarial Network . arXiv preprint arXiv:1703.06412, 2017. [pdf]
Nguyen, Anh, et al. Plug & play generative networks: Conditional iterative generation of images in latent space. CVPR, 2017. [pdf] [Supplementary]