Automatic-Image-Captioning

I have created a simple and effective neural network architecture that includes a Vision Deep CNN and a Language Generating RNN to automatically generate captions from images using Microsoft Common Objects in Context (MS COCO) dataset.

Image Captioning Model

Steps followed in this project:

1. The Dataset used is the COCO data set by Microsoft.

2. Feature vectors for images are generated using a CNN based on the ResNet architecture by Google.

3. Word embeddings are generated from captions for training images. NLTK was used for working with processing of captions.

4. Implemented an RNN decoder using LSTM cells.

5. Trained the network for approximately 3 hours using GPU to achieve average loss of about 2%.

6. Obtained outputs for some test images to understand efficiency of the trained network.

Network Architecture

Results:

These are some of the outputs given by the network using the microsoft coco dataset

About

I have created a simple and effective neural network architecture that includes a Vision Deep CNN and a Language Generating RNN to automatically generate captions from images using Microsoft Common Objects in Context (MS COCO) dataset.

computer-vision deep-learning

Languages

Language:HTML 53.6%Language:Jupyter Notebook 46.1%Language:Python 0.4%