rbento / pdf-table-to-csv

Attempts extracting tables from well-formed PDF files to a CSV.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pdf-table-to-csv

  • Attempts extracting tables from well-formed PDF files to a CSV.

  • PDF files containing images or more complex internal structures may not be properly converted.

Dependencies


  • Python 3.11
  • pipenv

Usage


# Clone 
git clone git@github.com:rbento/pdf-table-to-csv.git
# or
git clone https://github.com/rbento/pdf-table-to-csv.git

# Change directory
cd pdf-table-to-csv

# Sync and activate the virtual environment
pipenv sync
pipenv shell

# Convert one file
python convert.py /path/to/file.pdf   

# Convert multiple files
python convert.py /path/to/file1.pdf /path/to/file2.pdf 

Example


pipenv shell
(pdf-table-to-csv) ~/Workspace/pdf-table-to-csv $ python convert.py ~/Desktop/tax_slips.pdf ~/Desktop/stocks.py
Converting /Users/rbento/Desktop/tax_slips.pdf
> Converted to /Users/rbento/Desktop/tax_slips.csv
Skipping non-pdf source file /Users/rbento/Desktop/stocks.py

About

Attempts extracting tables from well-formed PDF files to a CSV.

License:MIT License


Languages

Language:Python 100.0%