TomatoFT / Image-Captioning

Image Captioning with EffiecentNet and Transformer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Image Captioning with EfficentNet and Transformer

Tech stack and Tools

  • Tensorflow
  • Streamlit
  • PostgresQL
  • Visual Studio Code
  • Anaconda
  • Google Colab (Jupiter Notebook)
  • What is Image Captioning problem ?

    Image captioning is the process of generating a natural language description of an image. It is a task in the field of computer vision and natural language processing. The goal of image captioning is to generate a coherent and fluent sentence that accurately describes the image content.

    An image captioning system typically consists of two main components:

  • An image feature extractor: This component is responsible for extracting features from the input image, such as object locations, sizes, and colors.
  • A natural language generator: This component takes the image features as input and generates a natural language description of the image.
  • The generated captions are typically evaluated using metrics such as BLEU, METEOR, ROUGE, and CIDEr.
  • How to run this project

    This project uses streamlit to demo the result of EfficentNet + Transformer (Trained with 11 epoches) and connect with PostgreSQL to save the information about the picture and some metadata to a database.

    So first you will need to install Anaconda, PostgreSQL and Python 3. Depend on your OS, there maybe many different ways to install it. In this project I use Ubuntu OS to install all of them. So I will put some video tutorial to install them here.

    PostgreSQL + pgAdminIII: https://www.youtube.com/watch?v=-LwI4HMR_Eg

    Python 3: https://www.youtube.com/watch?v=z3Hdewxuuoo

    Anaconda: https://www.youtube.com/watch?v=5kuqIFDouXY

    After completed install these things, you can do the below step.

    Clone the project

    git clone https://github.com/TomatoFT/Image-Captioning-with-Transformer
    cd Image-Captioning-with-Transformer
    

    Create and Enter the Anaconda Environment

    conda create --name image-captioning
    conda activate image-captioning
    

    Install dependencies

    conda install -c anaconda pip
    pip install -r requirements.txt
    

    Connect Streamlit to PostgreSQL

    Read this document from Streamlit: https://docs.streamlit.io/knowledge-base/tutorials/databases/postgresql#add-username-and-password-to-your-local-app-secrets. Then go to pgAminIII, press Add the connection to server and fill this form.

    image

    In .streamlit/secrets.toml file. Change these information to YOUR PostgreSQL information.

    [postgres]
    host="localhost"
    port=5432
    user="postgres"
    password="12345"
    database="postgres"
    

    Open the streamlit file and run demo

    streamlit run web.py
    

    Demo

    demo_image_captioning.mp4

    Exit Anaconda Environment

    conda deactivate
    

    Note

    Training model file is here: https://colab.research.google.com/drive/1K2ZFaAUNIYV0L92XEsV56HSYaXi4DMDh?usp=sharing. I use this file to train model and save its weights to local computer to deploy in Streamlit (You can find it at model/model_IC.h5).

    The model tutorial: https://keras.io/examples/vision/image_captioning/

    You can read the submitted report to understand the process I do this project.

    Feel free to clone my code to use.

    About

    Image Captioning with EffiecentNet and Transformer


    Languages

    Language:Python 100.0%