jmsquare / optical-character-recognition

This repository provides 2 functions to read contents and metadata from image pdf files (read.ocr) and from Word document (read.docx). Read.ocr function uses tesseract method to make optical character recognition (OCR) on image pdf file. Read.docx function unzips .docx file to convert to xml file and extract data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

optical-character-recognition

This repository provides 2 functions to read contents and metadata from image pdf files (read.ocr) and from Word document (read.docx). Read.ocr function uses tesseract method to make optical character recognition (OCR) on image pdf file. Read.docx function unzips .docx file to convert to xml file and extract data.

About

This repository provides 2 functions to read contents and metadata from image pdf files (read.ocr) and from Word document (read.docx). Read.ocr function uses tesseract method to make optical character recognition (OCR) on image pdf file. Read.docx function unzips .docx file to convert to xml file and extract data.


Languages

Language:R 100.0%