SunYanCN / WhatsSee

A simple and humble image captioning application, based on a neural network built with keras

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WhatsSee

WhatsSee is a simple and humble image captioning application, based on a neural network built with Keras. The back-end is written in Python and the Web GUI front-end is built with Flask framework.

Getting started

  • Clone repository
git clone https://github.com/luca-ant/WhatsSee.git

or

git clone git@github.com:luca-ant/WhatsSee.git
  • Install dependencies
sudo apt install python3-setuptools
sudo apt install python3-pip
sudo apt install python3-venv

or

sudo pacman -S python-setuptools 
sudo pacman -S python-pip
sudo pacman -S python-virtualenv
  • Create a virtual environment and install requirements modules
cd WhatsSee
python3 -m venv venv
source venv/bin/activate

python3 -m pip install -r requirements.txt

Running

  • Training: To train the model. You can choose the dataset, the number of training and validation examples and number of epoch. (All arguments are optional. Use 0 as value to choose all examples) Caution! Whole dataset will be downloaded!
python whats_see.py train -d flickr -nt 6000 -nv 1000 -ne 50
  • Resume: To resume last saved training and continue it.
python whats_see.py resume
  • Evaluate: To evaluate whole model on test images and calculate BLEU scores. You can specify the number of test examples (Use 0 as value to choose all examples).
python whats_see.py evaluate -n 1000
  • Test: To test the model by generating a caption of a test's image and compare the generated caption with the real ones.
python whats_see.py test -f TEST_IMAGE_FILE 
  • Generate: To generate a caption of your own image.
python whats_see.py generate -f YOUR_IMAGE_FILE 

Deployment

To deploy web aplication, simple run start_server.sh script. Open a browser and navigate to localhost:4753.

./start_server.sh

SCREEN SCREEN SCREEN

Result

A pre-trained model can be found on releases page.

The neural network was trained on training images of Flickr dataset here and it achieved the following BLEU scores on test images:

  • BLEU-1: 49.3%
  • BLEU-2: 30.5%
  • BLEU-3: 21.7%
  • BLEU-4: 11.1%

Examples

SCREEN SCREEN SCREEN SCREEN

Credits

  • WhatsSee was developed by Luca Antognetti

About

A simple and humble image captioning application, based on a neural network built with keras

License:GNU General Public License v3.0


Languages

Language:Python 66.0%Language:HTML 15.1%Language:CSS 9.7%Language:JavaScript 9.0%Language:Shell 0.2%