Optical Character Recognition (OCR) is used for the purpose of extracting text from an image. The main agenda of an OCR is to make easily viewed & editable documents from existing paper documents or image files. A clean image file can have an accuracy as high as 97.56% by using Tesseract OCR.
Here we developed a web application which simulates a OCR with Tesseract JS, JavaScript version of OCR and also other APIs to convert recognized text to pdf files and even output the text as audio. This can even help the visually impaired, illiterate, understand the pronunciation of words clearly and accurately.
- Clone the repo
- Run -npm init