Using VGG16 as the base neural network, and adding three layers of GRU on top of it, this is an Image Captioning model trained on MS-COCO dataset.
Using VGG16 as the base neural network, and adding three layers of GRU on top of it, this is an Image Captioning model trained on MS-COCO dataset.
MIT License