Image Captioning

Intro

This project is an implementation of the paper "Show and Tell: A Neural Image Caption Generator". It may not be completely similar.
Used Pytorch for the code. ResNet101 is used for extracting the features. You can check pre-trained models here.
Using COCO dataset 2017 Val images [5K/1GB], annotations [241MB].
Please check the make_vocab.py and data_loader.py.
- Vocab.pickle is a pickle file which contains all the words in the annotations.
- coco_ids.npy stores the image ID to be used. Also, you have to set the path or other settings. Execute prerocess_idx function.
You can run the source code and try out your own examples.

cd src
python train.py

cd src
python sample.py

Caption : a woman holding a teddy bear in a suit case

Image Captioning with Pytorch

MIT License

Language:Python 100.0%