martholomew / text-binarization-groundtruth

Groundtruth for Text Binarization Models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Text Binarization Groundtruth

Groundtruth for Text Binarization Models

Processing

To prepare the images for SAM-fine-tune, I ran this command on all of the images:

for img in *; do name="${img%.*}" convert "$img" -alpha off -type TrueColor png24:"$name".png; done

To convert them all to RGB PNG files with no transparency.

Sources

Repo Name Link
DIBCO_2009(_PRINT) DIBCO 2009
DIBCO_2010 DIBCO 2010
DIBCO_2011(_PRINT) DIBCO 2011
DIBCO_2012 DIBCO 2012
DIBCO_2013 DIBCO 2013
DIBCO_2014 H-DIBCO 2014
DIBCO_2016 H-DIBCO 2016
DIBCO_2017 DIBCO 2017
DIBCO_2018 H-DIBCO 2018
DIBCO_2019 DIBCO 2019
LIVEMEMORY LiveMemory Dataset (dataset 3)
NABUCO_(1/2) Nabuco Dataset
PERSIAN PHIBD 2012
BICKLEY Bickley Diary Dataset
RENNES Custom Dataset (CC BY-NC 4.0)
BLEEDTHROUGH Bleed-Through Database

Unused Sources

Bleed-Through Database

  • Did not use UCD.MSA29.12v as it was too messy and cleaned up NLI.MSG18.147 manually.

Palm Leaf Manuscript Dataset

  • The lines are accurate in path but not thickness.

DIVA-HisDB

  • Too innacurate for use here.

About

Groundtruth for Text Binarization Models