Groundtruth for Text Binarization Models
To prepare the images for SAM-fine-tune, I ran this command on all of the images:
for img in *; do name="${img%.*}" convert "$img" -alpha off -type TrueColor png24:"$name".png; done
To convert them all to RGB PNG files with no transparency.
Repo Name | Link |
---|---|
DIBCO_2009(_PRINT) | DIBCO 2009 |
DIBCO_2010 | DIBCO 2010 |
DIBCO_2011(_PRINT) | DIBCO 2011 |
DIBCO_2012 | DIBCO 2012 |
DIBCO_2013 | DIBCO 2013 |
DIBCO_2014 | H-DIBCO 2014 |
DIBCO_2016 | H-DIBCO 2016 |
DIBCO_2017 | DIBCO 2017 |
DIBCO_2018 | H-DIBCO 2018 |
DIBCO_2019 | DIBCO 2019 |
LIVEMEMORY | LiveMemory Dataset (dataset 3) |
NABUCO_(1/2) | Nabuco Dataset |
PERSIAN | PHIBD 2012 |
BICKLEY | Bickley Diary Dataset |
RENNES | Custom Dataset (CC BY-NC 4.0) |
BLEEDTHROUGH | Bleed-Through Database |
- Did not use
UCD.MSA29.12v
as it was too messy and cleaned upNLI.MSG18.147
manually.
- The lines are accurate in path but not thickness.
- Too innacurate for use here.