optical-character-recognition tesseract docx ppm-image

optical-character-recognition

This repository provides 2 functions to read contents and metadata from image pdf files (read.ocr) and from Word document (read.docx). Read.ocr function uses tesseract method to make optical character recognition (OCR) on image pdf file. Read.docx function unzips .docx file to convert to xml file and extract data.

About

This repository provides 2 functions to read contents and metadata from image pdf files (read.ocr) and from Word document (read.docx). Read.ocr function uses tesseract method to make optical character recognition (OCR) on image pdf file. Read.docx function unzips .docx file to convert to xml file and extract data.

optical-character-recognition tesseract docx ppm-image

Languages

Language:R 100.0%