Peratham / ImageCaption

Image Caption using keras, VGG16 pretrained model, CNN and RNN

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Image Caption

You could download Microsoft COCO dataset from here

VGG 16 pretrained model could download from here

如果你是**人,可以从百度网盘的这里下载

Then unzip COCO and specify path in image_caption_keras.py :

vgg_model_weights = '/home/qhduan/Downloads/COCO/vgg16_weights.h5'
coco_train = '/home/qhduan/Downloads/COCO/train2014'
coco_caption = '/home/qhduan/Downloads/COCO/annotations/captions_train2014.json'

You could open preview.ipynb to view the train result and test result.

$ python3 image_caption_keras.py
Using Theano backend.
Using gpu device 0: GeForce GTX 1070 (CNMeM is disabled, cuDNN 5105)
/home/qhduan/.local/lib/python3.5/site-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
  warnings.warn(warn)
[nltk_data] Downloading package punkt to /home/qhduan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
train_size 82783
100%|████████████████████████████████████████████████████████████████████████████████████████| 82783/82783 [00:06<00:00, 12363.22it/s]
vocabulary_size 8679
max_len 55
train_words_size 935568
(256, 4096) (256, 55) (256, 8679)
(256, 4096) (256, 55) (256, 8679)
(256, 4096) (256, 55) (256, 8679)
(256, 4096) (256, 55) (256, 8679)
(256, 4096) (256, 55) (256, 8679)
(256, 4096) (256, 55) (256, 8679)
(256, 4096) (256, 55) (256, 8679)
(256, 4096) (256, 55) (256, 8679)
(256, 4096) (256, 55) (256, 8679)
(256, 4096) (256, 55) (256, 8679)
(256, 4096) (256, 55) (256, 8679)
Epoch 1/20
935680/935680 [==============================] - 980s - loss: 4.1896 - acc: 0.2899          
Epoch 2/20
935680/935680 [==============================] - 980s - loss: 3.1131 - acc: 0.4088     
Epoch 3/20
935680/935680 [==============================] - 1024s - loss: 2.8843 - acc: 0.4295    
Epoch 4/20
935680/935680 [==============================] - 1053s - loss: 2.7526 - acc: 0.4415     
Epoch 5/20
935680/935680 [==============================] - 1053s - loss: 2.6622 - acc: 0.4504     
Epoch 6/20
935680/935680 [==============================] - 1001s - loss: 2.5747 - acc: 0.4587     
Epoch 7/20
935680/935680 [==============================] - 988s - loss: 2.4988 - acc: 0.4663      
Epoch 8/20
935680/935680 [==============================] - 1060s - loss: 2.4339 - acc: 0.4740     
Epoch 9/20
935680/935680 [==============================] - 1032s - loss: 2.3833 - acc: 0.4802     
Epoch 10/20
935680/935680 [==============================] - 1005s - loss: 2.3305 - acc: 0.4866     
Epoch 11/20
935680/935680 [==============================] - 1007s - loss: 2.2816 - acc: 0.4927     
Epoch 12/20
935680/935680 [==============================] - 1063s - loss: 2.2408 - acc: 0.4987     
Epoch 13/20
935680/935680 [==============================] - 1031s - loss: 2.1983 - acc: 0.5048     
Epoch 14/20
935680/935680 [==============================] - 995s - loss: 2.1705 - acc: 0.5086      
Epoch 15/20
935680/935680 [==============================] - 991s - loss: 2.1432 - acc: 0.5122     
Epoch 16/20
935680/935680 [==============================] - 984s - loss: 2.1109 - acc: 0.5173      
Epoch 17/20
935680/935680 [==============================] - 978s - loss: 2.0837 - acc: 0.5222      
Epoch 18/20
935680/935680 [==============================] - 978s - loss: 2.0582 - acc: 0.5257     
Epoch 19/20
935680/935680 [==============================] - 978s - loss: 2.0487 - acc: 0.5281     
Epoch 20/20
935680/935680 [==============================] - 978s - loss: 2.0179 - acc: 0.5325

About

Image Caption using keras, VGG16 pretrained model, CNN and RNN


Languages

Language:Jupyter Notebook 99.9%Language:Python 0.1%