uktrade / pg-bulk-ingest

Python utility function to ingest data into a SQLAlchemy-defined PostgreSQL table

Home Page:https://pg-bulk-ingest.docs.trade.gov.uk/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pg-bulk-ingest

PyPI package Test suite Code coverage

A Python utility function for ingesting data into SQLAlchemy-defined PostgreSQL tables, automatically migrating them as needed, allowing concurrent reads as much as possible.

Allowing concurrent writes is not an aim of pg-bulk-ingest. It is designed for use in ETL pipelines where PostgreSQL is used as a data warehouse, and the only writes to the table are from pg-bulk-ingest. It is assumed that there is only one pg-bulk-ingest running against a given table at any one time.

Features

pg-bulk-ingest exposes a single function as its API that:

  • Creates the tables if necessary
  • Migrates any existing tables if necessary, minimising locking
  • Ingests data in batches, where each batch is ingested in its own transaction
  • Handles "high-watermarking" to carry on from where a previous ingest finished or errored
  • Optionally performs an "upsert", matching rows on primary key
  • Optionally deletes all existing rows before ingestion
  • Optionally calls a callback just before each batch is visible to other database clients

Visit the pg-bulk-ingest documentation for usage instructions.

About

Python utility function to ingest data into a SQLAlchemy-defined PostgreSQL table

https://pg-bulk-ingest.docs.trade.gov.uk/

License:MIT License


Languages

Language:Python 97.8%Language:JavaScript 1.9%Language:Shell 0.3%