sinopsysHK / HsbcHkPdfScraper

Scraper for HK HSBC PDF account statements

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HsbcStatementHKScraper

Simple quick and dirty python3 based HSBC Account statement (for Hong Kong) PDF scrapper.

At least working on my 4 last years own statements without errors

Usage

from command line

$ python hsbcpdf\scraper.py <pdf file path> <outputdir>

write a csv file in with file name pattern [statement type]-[account number]-[statement date yyymm].csv

can also be used from code

from hsbcpdf import scraper

st = scraper.get_statement(r".\working\mypdffile.pdf")

json = st.get_json()
df = st.get_df()

returns json file with following structure:

{
    "main_account": "XXX-YYYYYY-ZZZ",
    "type": "BANK", # or "CARD"
    "statement_date": "25/05/2019",
    "previous_balance": {
        "HKDSavings": {
            "HKD": 50000000.00
        }, 
        "HKDCurrent": {
            "HKD": 69000000.00
        }, 
        "FCYSavings": {
            "USD": 32000000.00, 
            "EUR": 57000000.00
        }
    }, 
    "new_balance": {
        "HKDSavings": {
            "HKD": 100000000.00
        }, 
        "HKDCurrent": {
            "HKD": 9000000.00
        }, 
        "FCYSavings": {
            "USD": 30000000.00, 
            "EUR": 59000000.00
        }
    }, 
    "entries": [
        {
            "account": "HKDSavings",
            "date": "27/04/2019",
            "description": "MONTHLY EARNINGS", 
            "currency": "HKD", 
            "amount": 1000000.00
        }, 
        ...
    ]
}

Dependencies

  • pdfquery (thus pdfminer) - to locate relevant areas in the PDF
  • camelot (thus panda) - to extract the data tables

New Features!

  • supports also Credit Card statements

Installation

requires python v3.7 to run (other versions not tested).

Install the dependencies.

$ pip install pdfquery
$ pip install camelot

Packages are also available with conda (but my env is messed up so didn't managed to accomodate with version conflicts)

Then copy source code from github. Install in python local repository by launching:

$ python setup.py install

Todos

  • Write (MORE) Tests

License

GNU/MIT/FREE/...

Free Software, Hell Yeah!

About

Scraper for HK HSBC PDF account statements

License:MIT License


Languages

Language:Python 100.0%