Image to Text using Attention

Implementation of "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

Setup Environment

This demo uses the Theano framework.

Anaconda

We highly recommend using the Anaconda platform to manage all the python dependencies and virtual environments. Otherwise, you will have to manually install each of theano's dependencies.

Installing Theano

After Anaconda is installed, run

condo install theano

Running the code

Data

Download the data and annotations (Flickr8k, Flickr30k, Coco, etc)
Modify data/data_generation_params.json file to indicate the location of annotation file and Image dataset.
Resize the image to 224x224x3
Run the data/data_generation.py to generate image and annotation pickles.

Training

python train.py

Demo

python demo.py

Results

A group of people stand together.

A girl is in a field.

Dog running in field

Acknowledgements:

arctic-captions for code/reference
deep-learning tutorial for code/reference

About

Image to text using attention

MIT License

Languages

Language:Python 100.0%