SemStyle

Code and models for learning to generate stylized image captions from unaligned text.

A full list of semstyle results on MSCOCO images can be found here.

A pytorch rewrite of SemStyle is included in "./code".

This software is written in python 2.7 it assumes that you have installed (or a conda environment containing) torch, torchvision, scipy
To setup the semstyle models go to "./code/models/" and run "download.sh". Then from "./code" run: python img_to_text.py --test_folder <folder with your test images>
- The --cpu flag will disable the gpu.
- If not already installed, the system will start by downloading inception.v3 image classification model to the local torch directory, this may take a while (https://pytorch.org/docs/stable/torchvision/models.html)
Training code is included.
Scripts to generate the training data from publicly available sources is coming.

Online demo

For a limited time (while cpu cycles last): Live demo, caption your own images

A blog post decribing the SemStyle system is here: http://cm.cecs.anu.edu.au/post/semstyle/

Citing this work:

Alexander Mathews, Lexing Xie, Xuming He. SemStyle: Learning to Generate Stylised Image Captions Using Unaligned Text, in Conference on Computer Vision and Pattern Recognition (CVPR ‘18), Salt Lake City, USA, 2018. https://arxiv.org/abs/1805.07030

@inproceedings{mathews2018semstyle,
  title={{SemStyle}:  Learning to Generate Stylised Image Captions using Unaligned Text},
  author={Mathews, Alexander and Xie, Lexing and He, Xuming},
  booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2018}
}

About

Code for learning to generate stylized image captions from unaligned text

Languages

Language:Python 85.9%Language:Jupyter Notebook 13.7%Language:Shell 0.3%Language:HTML 0.1%