Image Caption Generator

A Neural Network based generative model for captioning images.

Checkout the android app made using this image-captioning-model: Cam2Caption and the associated paper.

Work in Progress

Updates(Jan 14, 2018):

Some Code Refactoring.
Added MSCOCO dataset support.

Updates(Mar 12, 2017):

Added Dropout Layer for LSTM, Xavier Glorot Initializer for Weights
Significant Optimizations for Caption Generation i.e Decode Routine, computation time reduce from 3 seconds to 0.2 seconds
Functionality to Freeze Graphs and Merge them.
Direct Serving(Dual Graph and Single Graph) Routines in /util/
Explored and chose the fastest and most efficient Image Preprocessing Method.
Ported code to TensorFlow r1.0

Updates(Feb 27, 2017):

Added BLEU evaluation metric and batch processing of images to produce batches of captions.

Updates(Feb 25, 2017):

Added optimizations and one-time pre-processing of Flickr30K data
Changed to a faster Image Preprocessing method using OpenCV

To-Do(Open for Contribution):

FIFO-queues in training
Attention-Model
Trained Models for Distribution.

Pre-Requisites:

Tensorflow r1.0
NLTK
pandas
Download Flickr30K OR MSCOCO images and captions.
Download Pre-Trained InceptionV4 Tensorflow graph from DeepDetect available here

Procedure to Train and Generate Captions:

Clone the Repository to preserve Directory Structure
For flickr30k put results_20130124.token and Flickr30K images in flickr30k-images folder OR For MSCOCO put captions_val2014.json and MSCOCO images in COCO-images folder .
Put inception_v4.pb in ConvNets folder
Generate features(features.npy) corresponding to the images in the dataset folder by running-
- For Flickr30K: python convfeatures.py --data_path Dataset/flickr30k-images --inception_path ConvNets/inception_v4.pb
- For MSCOCO: python convfeatures.py --data_path Dataset/COCO-images --inception_path ConvNets/inception_v4.pb
To Train the model run-
- For Flickr30K: python main.py --mode train --caption_path ./Dataset/results_20130124.token --feature_path ./Dataset/features.npy --resume
- For MSCOCO: python main.py --mode train --caption_path ./Dataset/captions_val2014.json --feature_path ./Dataset/features.npy --data_is_coco --resume
To Generate Captions for an Image run
- python main.py --mode test --image_path VALID_PATH
For usage as a python library see Demo.ipynb

(see python main.py -h for more)

Miscellaneous Notes:

Freezing the encoder and decoder Graphs

It's necessary to save both encoder and decoder graphs while running test. This is a one-time necessary run before freezing the encoder/decoder.
- python main.py --mode test --image_path ANY_TEST_IMAGE.jpg/png --saveencoder --savedecoder
In the project root directory use - python utils/save_graph.py --mode encoder --model_folder model/Encoder/ additionally you may want to use --read_file if you want to freeze the encoder for directly generating caption for an image file(path). Similarly, for decoder use - python utils/save_graph.py --mode decoder --model_folder model/Decoder/, read_file argument is not necessary for the decoder.
To use frozen encoder and decoder models as dual blackbox Serve-DualProtoBuf.ipynb. Note: You must freeze encoder graph with --read_file to run this notebook

(see python utils/save_graph.py -h for more)

Merging the encoder and decoder graphs for serving the model as a blackbox:

It's necessary to freeze the encoder and decoder as mentioned above.
In the project root directory run-
- python utils/merge_graphs.py --encpb ./model/Trained_Graphs/encoder_frozen_model.pb --decpb ./model/Trained_Graphs/encoder_frozen_model.pb additionally you may want to use --read_file if you want to freeze the encoder for directly generating caption for an image file(path).
To use merged encoder and decoder models as single frozen blackbox: Serve-SingleProtoBuf.ipynb. Note: You must freeze and merge encoder graph with --read_file to run this notebook

(see python utils/merge_graphs.py -h for more)

Training Steps vs Loss Graph in Tensorboard:

tensorboard --logdir model/log_dir
Navigate to localhost:6006

Citation:

If you use our model or code in your research, please cite the paper:

@article{Mathur2017,
  title={Camera2Caption: A Real-time Image Caption Generator},
  author={Pranay Mathur and Aman Gill and Aayush Yadav and Anurag Mishra and Nand Kumar Bansode},
  journal={IEEE Conference Publication},
  year={2017}
}