WebDNN Image Caption Generator
A chrome extension using WebDNN to generate image caption when hovering on images.
This repos is to demonstrate how to use WebDNN on the client-side, chrome extension in this case. The Chrome Extension generator is used to generate the extension boilerplate.
Getting Started
For development, you need to load/reload extension after starting gulp watch for Live-reload to work.
# npm watch
gulp watch --sass
The Steps
There are mainly 2 parts: Converting WebDNN and Implementing the models on client-side.
- Converting WebDNN using pre-train
caption_gen_resnet.model
model andcoco.pkl
dataset. Since Chrome doesn't support WebGPU yet, we use WebAssembly as backend. The models were generated and moved into the folderapp/
, but if you want to modify the models, you can run the following lind.
# npm run convert:webdnn
python py/convert_webdnn.py --backend webassembly --sentence datasets/coco.pkl --model models/caption_gen_resnet.mode && mv ./webdnn ./app
- Using WebDNN to load the image feature model and caption generation model in browser. line
- Generating image feature by passing the image array into the image feature model. line
- Generating caption by passing the image feature into the caption generation model. line
Measurements
The following data is the average time taken of generating a sentence in 10 runs. The image feature extraction takes around 6 seconds, the longest among all.
Measurements | Average Time (ms) |
---|---|
Load Image Runner | 2608.9 |
Load Caption Runner | 1143.44 |
Extract image feature | 5803.77 |
Generate Caption from image feature | 511.77 |
TroubleShotings
FileNotFoundError: [Errno 2] No such file or directory: 'python2': 'python2'
- Install Python 2 by
brew install python@2
- Update
~/.bashrc
by adding this lineexport PATH="/usr/local/opt/python@2/bin:$PATH"
- Reload the Bash startup file
source ~/.bashrc
fatal error: 'Eigen/Dense' file not found
This fatal error occurs due to the eigen3
cannot be included
- Install eigen by
brew install eigen
- Add the following lines in the file
$(brew --prefix)/lib/python3.6/site-packages/webdnn/backend/webassembly/generator.py
# After both line 54 and 83
# args = ["em++"]
args.append("-I")
args.append("/usr/local/include/eigen3")
Reference
- https://github.com/mil-tokyo/webdnn - WebDNN runs deep neural network (DNN) pre-trained model on web browser.
- https://milhidaka.github.io/chainer-image-caption/ - Generating image caption demo
- https://github.com/milhidaka/webdnn-exercise - Exercise of basic usage of WebDNN