cnn image-caption-generator image-processing lstm nltk streamlit tensorflow text-processing vgg16

Image Caption Generation using Deep Learning

Demo

Link: https://imgcaptiongen.streamlit.app/

Note: If the website link provided above is not working, it might mean that the deployment has been stopped or there are technical issues. We apologize for any inconvenience.

Please consider giving a ⭐ to the repository if you find this app useful.
A quick preview of the Image Caption Generator app:

Overview

This repository contains code for an image caption generation system using deep learning techniques. The system leverages a pretrained VGG16 model for feature extraction and a custom captioning model which was trained using LSTM for generating captions. The model is trained on the Flickr8k dataset using an attention mechanism to improve caption quality.

Note: While using the VGG16 model for feature extraction provides accurate results, it's important to be mindful of memory usage. The VGG16 model can consume a significant amount of memory, potentially causing issues in resource-constrained environments. To address this, it's advised to consider using the MobileNetV2 model for feature extraction. MobileNetV2 strikes a balance between memory efficiency and performance, making it a practical choice for scenarios with limited resources. Consequently, in my deployed app, I've opted for MobileNetV2.

The key components of the project include:

Image feature extraction using a pretrained VGG16 model (Consider using MobileNetV2 for memory efficiency)
Caption preprocessing and tokenization
Custom captioning model architecture with attention mechanism
Model training and evaluation
Streamlit app for interactive caption generation

About the Dataset

The Flickr8k dataset is used for training and evaluating the image captioning system. It consists of 8,091 images, each with five captions describing the content of the image. The dataset provides a diverse set of images with multiple captions per image, making it suitable for training caption generation models.

Download the dataset from Kaggle and organize the files as follows:

flickr8k
- Images
  - (image files)
- captions.txt

Installation

This project is written in Python 3.10.12. If you don't have Python installed, you can download it from the official website. If you have an older version of Python, you can upgrade it using the pip package manager, which should be already installed if you have Python 2 >=2.7.9 or Python 3 >=3.4 on your system. To install the required packages and libraries, you can use pip and the provided requirements.txt file. First, clone this repository to your local machine using the following command:

https://github.com/Sajid030/image-caption-generator.git

Once you have cloned the repository, navigate to the project directory and run the following command in your terminal or command prompt:

pip install -r requirements.txt

This will install all the necessary packages and libraries needed to run the project.

Deployement on Streamlit

Create an account on Streamlit Sharing.
Fork this repository to your GitHub account.
Log in to Streamlit Sharing and create a new app.
Connect your GitHub account to Streamlit Sharing and select this repository.
Set the following configuration variables in the Streamlit Sharing dashboard:

[server]
headless = true
port = $PORT
enableCORS = false

Click on "Deploy app" to deploy the app on Streamlit Sharing.

Directory Tree

|   app.py
|   image-captioner.ipynb
|   LICENSE.md
|   mymodel.h5
|   README.md
|   requirements.txt
|   tokenizer.pkl
\---resource
        demo.gif

Bug / Feature Request

If you encounter any bugs or issues with the loan status predictor app, please let me know by opening an issue on my GitHub repository. Be sure to include the details of your query and the expected results. Your feedback is valuable in helping me improve the app for all users. Thank you for your support!

Future Scope

Fine-tuning: Experiment with fine-tuning the captioning model architecture and hyperparameters for improved performance.
Dataset Expansion: Incorporate additional datasets to increase the diversity and complexity of the trained model for example we can train the model on Flickr30k dataset.
Beam Search: Implement beam search decoding for generating multiple captions and selecting the best one.
User Interface Enhancements: Improve the Streamlit app's user interface and add features such as image previews and caption confidence scores.
Multilingual Captioning: Extend the model to generate captions in multiple languages by incorporating multilingual datasets.

About

Deep learning-based image captioning with Flickr8k dataset. Code includes data prep, model training, and a Streamlit app.

https://imgcaptiongen.streamlit.app/