Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.
Synthetic Scene-Text Image Samples
The code in the
master branch is for Python2. Python3 is supported in the
The main dependencies are:
pygame, opencv (cv2), PIL (Image), numpy, matplotlib, h5py, scipy
python gen.py --viz
This will download a data file (~56M) to the
data directory. This data file includes:
- dset.h5: This is a sample h5 file which contains a set of 5 images along with their depth and segmentation information. Note, this is just given as an example; you are encouraged to add more images (along with their depth and segmentation information) to this database for your own use.
- data/fonts: three sample fonts (add more fonts to this folder and then update
fonts/fontlist.txtwith their paths).
- data/newsgroup: Text-source (from the News Group dataset). This can be subsituted with any text file. Look inside
text_utils.pyto see how the text inside this file is used by the renderer.
- data/models/colors_new.cp: Color-model (foreground/background text color model), learnt from the IIIT-5K word dataset.
- data/models: Other cPickle files (char_freq.cp: frequency of each character in the text dataset; font_px2pt.cp: conversion from pt to px for various fonts: If you add a new font, make sure that the corresponding model is present in this file, if not you can add it by adapting
This script will generate random scene-text image samples and store them in an h5 file in
results/SynthText.h5. If the
--viz option is specified, the generated output will be visualized as the script is being run; omit the
--viz option to turn-off the visualizations. If you want to visualize the results stored in
results/SynthText.h5 later, run:
A dataset with approximately 800000 synthetic scene-text images generated with this code can be found here.
Adding New Images
Segmentation and depth-maps are required to use new images as background. Sample scripts for obtaining these are available here.
predict_depth.mMATLAB script to regress a depth mask for a given RGB image; uses the network of Liu etal. However, more recent works (e.g., this) might give better results.
floodFill.pyfor getting segmentation masks using gPb-UCM.
For an explanation of the fields in
label), please check this comment.
Pre-processed Background Images
The 8,000 background images used in the paper, along with their segmentation and depth masks, have been uploaded here:
<filename> can be:
||180K||names of images which do not contain background text|
||8.9G||images (filter these using
Note: due to large size,
depth.h5 is also available for download as 3-part split-files of 5G each.
These part files are named:
depth.h5-00, depth.h5-01, depth.h5-02. Download using the path above, and put them together using
cat depth.h5-0* > depth.h5.
use_preproc_bg.py provides sample code for reading this data.
Note: I do not own the copyright to these images.
Generating Samples with Text in non-Latin (English) Scripts
- @JarveeLee has modified the pipeline for generating samples with Chinese text here.
- @adavoudi has modified it for arabic/persian script, which flows from right-to-left here.
- @MichalBusta has adapted it for a number of languages (e.g. Bangla, Arabic, Chinese, Japanese, Korean) here.
- @gachiemchiep has adapted for Japanese here.
- @gungui98 has adapted for Vietnamese here.
- @youngkyung has adapted for Korean here.
- @kotomiDu has developed an interactive UI for generating images with text here.
Please refer to the paper for more information, or contact me (email address in the paper).