skcript / simple-ocr

A convenient way of reading PDF's and Images using Tesseract

Home Page:www.skcript.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Simple-OCR

Simple-OCR provides a more convenient way of reading PDF's and Images using the Tessaract Engine.

Installation Instructions

  1. Install Tesseract.
  2. Install ImageMagick.

Example Usage

It's very simple to use Simple-OCR:

# Specify the path of your source image or PDF.
img = OCR::Image.new("source.png")

# Specify the output file name, called "destination" here.
img.scan("destination", "-l eng", :pdf)

You can also give custom command line options.

img.scan("destination", "-l eng -psm 1...", :pdf)

It is also possible to specify the output file type, which can either be:

  • pdf
  • txt
  • hocr
img.scan("destination", "-l eng", :txt)
img.scan("destination", "-l eng", :hocr)

About

Skcript

SimpleOCR is maintained and funded by Skcript. The names and logos for Skcript are properties of Skcript.

We love open source, and we have been doing quite a bit of contributions to the community. Take a look at them here. Also, encourage people around us to get involved in community operations. Join us, if you'd like to see the world change from our HQ.

About

A convenient way of reading PDF's and Images using Tesseract

www.skcript.com


Languages

Language:Shell 84.6%Language:Ruby 15.4%