PDF (IMAGE) TO WORD CONVERTER

ABSTRACT

Optical Character Recognition (OCR) is used for the purpose of extracting text from an image. The main agenda of an OCR is to make easily viewed & editable documents from existing paper documents or image files. A clean image file can have an accuracy as high as 97.56% by using Tesseract OCR.

Here we developed a web application which simulates a OCR with Tesseract JS, JavaScript version of OCR and also other APIs to convert recognized text to pdf files and even output the text as audio. This can even help the visually impaired, illiterate, understand the pronunciation of words clearly and accurately.

INSTALLATION

Clone the repo
Run -npm init

About

Apache License 2.0

Languages

Language:CSS 43.0%Language:SCSS 28.1%Language:JavaScript 16.8%Language:EJS 12.1%