- git clone https://github.com/manhdh32/1st_kalapa_ocr.git
- cd 1st_kalapa_ocr
- pip install -r requirements.txt
- Kalapa dataset.
- Vietnamese address dataset
- Processed address corpus
- Synthetic data. Download my generated data here
Get pretrained model here or generate synthetic data with:
python tools/gen_synthetic_data.py
and training from scratch:
python PaddleOCR/tools/train.py -c configs/pretrained_config.yml
Fine-tune:
python PaddleOCR/tools/train.py -c configs/final_kalapa.yml
Inference notebook here