Ankit-rana / table_extractor

extractor is a tool to extract table from pdf and create spreadsheet out of it.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Table Extractor

Introduction

extractor is a tool to extract table from pdf and create excel out of it.

Prerequisite

please install 2 dependencies.

  • Tkinter
  • ghostscript

For ubuntu

apt install python3-tk ghostscript

Or For Centos

yum install tkinter ghostscript

Installation

$ virtualenv ENV
$ source ENV/bin/activate
$ git clone https://github.com/Ankit-rana/table_extractor.git
$ cd table_extractor
$ pip3 install -r requirements.txt
$ python3 setup.py install

Steps

$ extractor sample.pdf 
$ ls
foo.xlsx

Configuration

  • extractor also provides ways to configure it. you can find configuration in /etc/extractor.conf
  • take a look at the sample configuration file
[DEFAULT]
START_FIELD_NAME=Booking Date
DATEFIELDS=0,1,3 
AMOUNTFIELDS=4,5,6

About

extractor is a tool to extract table from pdf and create spreadsheet out of it.


Languages

Language:Python 100.0%