ocr tesseract pdf convenient image-processing

Simple-OCR

Simple-OCR provides a more convenient way of reading PDF's and Images using the Tessaract Engine.

Installation Instructions

Install Tesseract.
Install ImageMagick.

Example Usage

It's very simple to use Simple-OCR:

# Specify the path of your source image or PDF.
img = OCR::Image.new("source.png")

# Specify the output file name, called "destination" here.
img.scan("destination", "-l eng", :pdf)

You can also give custom command line options.

img.scan("destination", "-l eng -psm 1...", :pdf)

It is also possible to specify the output file type, which can either be:

pdf
txt
hocr

img.scan("destination", "-l eng", :txt)
img.scan("destination", "-l eng", :hocr)

About

SimpleOCR is maintained and funded by Skcript. The names and logos for Skcript are properties of Skcript.

We love open source, and we have been doing quite a bit of contributions to the community. Take a look at them here. Also, encourage people around us to get involved in community operations. Join us, if you'd like to see the world change from our HQ.

About

A convenient way of reading PDF's and Images using Tesseract

www.skcript.com

ocr tesseract pdf convenient image-processing

Languages

Language:Shell 84.6%Language:Ruby 15.4%