gorgeousyouth/PriorImageCaption

Pytorch Implementation of Improving Reinforcement Learning Based Image Captioning with Natural Language Prior

Requirements

Python 2.7

PyTorch 0.4 (along with torchvision)

cider package （copy from Here and dump them to cider/)

pycoco package (copy from Here and extract them to pycoco/)

You need to download pretrained resnet model for both training and evaluation. The models can be downloaded from here, and should be placed in data/imagenet_weights.

Train your own network on COCO

Download COCO captions and preprocess them

Download preprocessed coco captions from link following Karpathy's split. Copy dataset_coco.json,captions_train.json,captions_val.json and captions_test.json in to data/features.

Then do:

$ python scripts/prepro_labels.py --input_json data/dataset_coco.json --output_json data/cocotalk.json --output_h5 data/cocotalk

prepro_labels.py will map all words that occur <= 5 times to a special UNK token, and create a vocabulary for all the remaining words. The image information and vocabulary are dumped into data/cocotalk.json and discretized caption data are dumped into data/cocotalk_label.h5.

Download COCO dataset and pre-extract the image features

Download the coco images from link. We need 2014 training images and 2014 val. images. You should put the train2014/ and val2014/ in the same directory, denoted as $IMAGE_ROOT.

Then:

$ python scripts/prepro_feats.py --input_json data/dataset_coco.json --output_dir data/cocotalk --images_root $IMAGE_ROOT

prepro_feats.py extract the resnet101 features (both fc feature and last conv feature) of each image. The features are saved in data/cocotalk_fc and data/cocotalk_att, and resulting files are about 200GB.

(Check the prepro scripts for more options, like other resnet models or other attention sizes.)

Warm Starm

In order to help CIDEr based REINFORCE algorithm converge more stable and faster, We need to warm start the captioning model and run the script below

$ python train_warm.py --caption_model fc

if you want to use Attention, then run

$ python train_warm.py --caption_model att

Download our pretrained warm start model from this link. And the best CIDEr score in validation set are 90.1 for FC and 94.2 for Attention.

Train using Self-critical

$ python train_sc_cider.py --caption_model att

You will see a large boost of CIDEr score but with lots of bad endings.

Train using Ngram constraint

First you should preprocess the dataset and get the ngram data:

$ python get_ngram.py

and will generate fourgram.pkl and trigram.pkl in data/ .

Then

$ python train_fourgram.py  --caption_model fc

It will take almost 40,000 iterations to converge and the experiment details are written in experiment.log in save_dir like

Train using Neural Language model

First you should train a neural language or you can download our pretrained LSTM language model from link.

$ python train_rnnlm.py

Then train RL setting with Neural Language model constraint with the same warm start model.

$ python train_rnnlm_cider.py  --caption_model fc

$ python train_rnnlm_cider.py  --caption_model att

It will take almost 36,000 iterations to converge and the experiment details are written in experiment.log in save_dir.

Evaluating `CIDEr`,`METEOR`,`ROUGEL`,`BLEU`score with Bad Ending removal

$ python Eval_model.py  --caption_model fc --rl_type fourgram

Try another network structure

We also try another neural network structure and get the similar results. Please see the MoreNet.md for more details.

Acknowledgements

Thanks the original self-critical performed by ruotianluo.

gorgeousyouth / PriorImageCaption

Requirements

Train your own network on COCO

Download COCO captions and preprocess them

Download COCO dataset and pre-extract the image features

Warm Starm

Train using Self-critical

Train using Ngram constraint

Train using Neural Language model

Evaluating `CIDEr`,`METEOR`,`ROUGEL`,`BLEU`score with Bad Ending removal

Try another network structure

Acknowledgements

About

Languages

Requirements

Train your own network on COCO

Download COCO captions and preprocess them

Download COCO dataset and pre-extract the image features

Warm Starm

Train using Self-critical

Train using Ngram constraint

Train using Neural Language model

Evaluating CIDEr,METEOR,ROUGEL,BLEUscore with Bad Ending removal

Try another network structure

Acknowledgements

About

Languages

Evaluating `CIDEr`,`METEOR`,`ROUGEL`,`BLEU`score with Bad Ending removal