d-behl / lstm-attention-captioning

Image Captioning using LSTMs and Atterntion

lstm-attention-captioning

The model generates textual description of an image using both Natural Language Processing and Computer Vision. A pre-trained CNN (MobileNet v2) is used as a feature extractor. An LSTM network is used to generate the captions based on the extracted features. Attention is used to allow different words in the generated caption to depend on different parts of image.

Implemened in PyTorch 1.4.

About

Image Captioning using LSTMs and Atterntion

deep-learning

MIT License

Languages

Language:Jupyter Notebook 100.0%Language:Python 0.0%