gregbugaj / TextGenerator

OCR dataset Text-Detection dataset Font-Classification dataset generator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TextGenerator

  • This is a tools for ocr dataset, text detection, fonts classification dataset generate.
  • This is the most convenient tool for generating ocr data, text detection data, and font recognition

Realized functions:

-Generate text maps with different fonts, font sizes, colors, and rotation angles based on different corpora -Support multi-process fast generation -The text map is filled into the layout block according to the specified layout mode -Find smooth areas in the image as layout blocks -Support the extraction and export of blocks in the text area (export json file, txt file and picture file, can generate voc data, ICDAR_LSVT data set format!) -Support annotations for each text level (stored in the json file of lsvt) -Support users to configure various generation configurations (image reading, generation path, various probabilities)

Effect preview

Generate picture example:

Text map example:

Rotating rectangle example

Example of a single text bounding box

-Environment installation (Python3.6+, conda environment is recommended)

```
# step 1
pip install requirements.txt
# step 2
sh make.sh
```

-Edit the configuration file config.yml (optional)

  • Execute build script

    python3 run.py
    

-Generated data

The generated data is stored in the directory specified by `provider> layout> out_put_dir` in `config.yml`.

About

OCR dataset Text-Detection dataset Font-Classification dataset generator

License:MIT License


Languages

Language:Python 94.1%Language:Cython 5.8%Language:Shell 0.1%