Note : It's not the final version code. I will the refine and update the code.
Models detection speech bubble in webtoons or cartoons. I have referenced and implemented pytorch-YOLOv4 to detect speech bubble. The key point for improving performance is data analysis. In the case of speech bubbles, there are various forms. Therefore, I define the form of speech bubbles and present the results of training by considering the distribution of data.
- In fact, there are various colors and various shapes of speech bubbles in webtoons.
Key standard for Data Definition: Shape, Color, Form
standard
-
shape : Ellipse(tawon), Thorn(gasi), Sea_urchin(seonggye), Rectangle(sagak), Cloud(gurm)
-
Color : Black/white(bw), Colorful(color), Transparency(tran), Gradation
-
Form : Basic, Double Speech bubble, Multi-External, Scatter-type
-
In this project, two categories are applied, shape and color, and form and Gradation are classified as ect.
This class is not about detection, but about speech bubble data distribution.
-
Pytorch Version
- Pytorch 1.4.0 for TensorRT 7.0 and higher
- Pytorch 1.5.0 and 1.6.0 for TensorRT 7.1.2 and higher
-
Install Dependencies Code
pip install onnxruntime numpy torch tensorboardX scikit_image tqdm easydict Pillow skimage opencv_python pycocotools
or
pip install -r requirements.txt
Model | Link |
---|---|
YOLOv4 | Link |
YOLOv4-bubble | Link |
-
1. Download weight
-
2. Train
python train.py -g gpu_id -classes number of classes -dir 'data_dir' -pretrained 'pretrained_model.pth'
or
Train.sh
-
3. Config setting
-
cfg.py
-
cfg/yolov4.cfg
- class 1
- filter 18 (4 + 1 + class) * 3 (line: 961, 1049, 1137)
-
If you want to train custom dataset, use the information above.
- 1. Download weight
- 2. Demo
python demp.py -cfgfile cfgfile -weightfile pretrained_model.pth -imgfile image_dir
- defualt cfgfile is
./cfg/yolov4.cfg
- defualt cfgfile is
- 1. validation dataset
tawon_bw | tawon_color | tawon_Transparency | gasi_bw | gasi_color | gasi_Transparency | seonggye_bw | seonggye_color | seonggye_Transparency | sagak_bw | sagak_color | sagak_Transparency | gurm_bw | gurm_color | gurm_Transparency | total |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
116 | 70 | 68 | 65 | 29 | 59 | 51 | 43 | 44 | 42 | 33 | 69 | 47 | 2 | 12 | 750 |
- The above distribution is based on speech bubbles, not cuts.
- The distribution is not constant because there are a number of speech bubbles inside a single cut. In addition, for some classes, examples are difficult to find, resulting in an unbalanced distribution as shown above.