bubble detector using YOLOv4

Note : It's not the final version code. I will the refine and update the code.

Overview

Models detection speech bubble in webtoons or cartoons. I have referenced and implemented pytorch-YOLOv4 to detect speech bubble. The key point for improving performance is data analysis. In the case of speech bubbles, there are various forms. Therefore, I define the form of speech bubbles and present the results of training by considering the distribution of data.

Definition of Speech Bubble

Various speech bubble forms of real webtoons

In fact, there are various colors and various shapes of speech bubbles in webtoons.

New Definition

Key standard for Data Definition: Shape, Color, Form

standard

shape : Ellipse(tawon), Thorn(gasi), Sea_urchin(seonggye), Rectangle(sagak), Cloud(gurm)
Color : Black/white(bw), Colorful(color), Transparency(tran), Gradation
Form : Basic, Double Speech bubble, Multi-External, Scatter-type
example image
In this project, two categories are applied, shape and color, and form and Gradation are classified as ect.

classes

This class is not about detection, but about speech bubble data distribution.

Install dependencies

Pytorch Version
- Pytorch 1.4.0 for TensorRT 7.0 and higher
- Pytorch 1.5.0 and 1.6.0 for TensorRT 7.1.2 and higher

Install Dependencies Code

pip install onnxruntime numpy torch tensorboardX scikit_image tqdm easydict Pillow skimage opencv_python pycocotools

pip install -r requirements.txt

Pretrained model

Model	Link
YOLOv4	Link
YOLOv4-bubble	Link

Train

1. Download weight

2. Train

python train.py -g gpu_id -classes number of classes  -dir 'data_dir' -pretrained 'pretrained_model.pth'

Train.sh

3. Config setting
- cfg.py
  - class = 1
  - learning_rate = 0.001
  - max_batches = 2000 (class * 2000)
  - steps = [1600, 1800], (max_batches * 0.8 , max_batches * 0.9)
  - train_dir = your dataset root
    - root tree
      
      The image folder contains .jpg or .png image files. The XML folder contains .XML files(label).
- cfg/yolov4.cfg
  - class 1
  - filter 18 (4 + 1 + class) * 3 (line: 961, 1049, 1137)

If you want to train custom dataset, use the information above.

Demo

1. Download weight

2. Demo

python demp.py -cfgfile cfgfile -weightfile pretrained_model.pth -imgfile image_dir

defualt cfgfile is ./cfg/yolov4.cfg

Metric

1. validation dataset

tawon_bw	tawon_color	tawon_Transparency	gasi_bw	gasi_color	gasi_Transparency	seonggye_bw	seonggye_color	seonggye_Transparency	sagak_bw	sagak_color	sagak_Transparency	gurm_bw	gurm_color	gurm_Transparency	total
116	70	68	65	29	59	51	43	44	42	33	69	47	2	12	750

The above distribution is based on speech bubbles, not cuts.
The distribution is not constant because there are a number of speech bubbles inside a single cut. In addition, for some classes, examples are difficult to find, resulting in an unbalanced distribution as shown above.

10antz22 / Bubble-Detector-YOLOv4