-
This project is an implementation of the paper "Show and Tell: A Neural Image Caption Generator". It may not be completely similar.
-
Used Pytorch for the code. ResNet101 is used for extracting the features. You can check pre-trained models here.
-
Using COCO dataset 2017 Val images [5K/1GB], annotations [241MB].
-
Please check the make_vocab.py and data_loader.py.
- Vocab.pickle is a pickle file which contains all the words in the annotations.
- coco_ids.npy stores the image ID to be used. Also, you have to set the path or other settings. Execute prerocess_idx function.
-
You can run the source code and try out your own examples.
- Python 3.8.5
- Pytorch 1.7.1
- cuda 11.0
- For train
cd src
python train.py
- For test
cd src
python sample.py
- Epoch 100
Caption : a woman holding a teddy bear in a suit case
- TensorBoard
- Description of the model and other details
- Code Refactoring
- Upload requirements.txt