alexeyshockov / asyncpg-rkt

A fast PostgreSQL database client library for Python/asyncio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

asyncpg-πŸš€ -- A fast PostgreSQL Database Client Library for Python/asyncio that returns numpy arrays

GitHub Actions status

asyncpg-rkt is a fork of asyncpg, a database interface library designed specifically for PostgreSQL and Python/asyncio. asyncpg is an efficient, clean implementation of PostgreSQL server binary protocol for use with Python's asyncio framework. You can read more about asyncpg in an introductory blog post.

asyncpg-rkt extends asyncpg as follows:

  • Backward compatible with the origin.
  • It is possible to set the numpy dtype for the fetched query.
  • Such "typed" queries return numpy arrays instead of lists of Record objects.
  • We construct numpy arrays directly from the low-level PostgreSQL protocol, without materializing any Python objects.
  • Although, we support object fields, too.
  • The time from receiving the response from PostgreSQL server until Connection.fetch() returns is ~20x less. This is because we avoid the overhead of dealing with Python objects in the result.
  • We return ravel()-ed indexes of nulls while writing NaN-s/NaT-s at the corresponding places in the array.
  • There is an option to return data by column vs. by row.

asyncpg-rkt provides the best performance when there are thousands of rows returned and the field types map to numpy.

Read the blog post with the introduction.

asyncpg-πŸš€ requires Python 3.8 or later and is supported for PostgreSQL versions 9.5 to 14. Older PostgreSQL versions or other databases implementing the PostgreSQL protocol may work, but are not being actively tested.

Documentation

The project documentation can be found here.

See below about how to use the fork's special features.

Performance

In our testing asyncpg is, on average, 3x faster than psycopg2 (and its asyncio variant -- aiopg).

https://raw.githubusercontent.com/athenianco/asyncpg-rkt/master/performance.png

The above results are a geometric mean of benchmarks obtained with PostgreSQL client driver benchmarking toolbench in November 2020 (click on the chart to see full details).

Further improvement from writing numpy arrays is ~20x:

Features

asyncpg implements PostgreSQL server protocol natively and exposes its features directly, as opposed to hiding them behind a generic facade like DB-API.

This enables asyncpg to have easy-to-use support for:

  • prepared statements
  • scrollable cursors
  • partial iteration on query results
  • automatic encoding and decoding of composite types, arrays, and any combination of those
  • straightforward support for custom data types

Installation

asyncpg-πŸš€ is available on PyPI and requires numpy 1.21+. Use pip to install:

$ pip install asyncpg-rkt

Basic Usage

import asyncio
import asyncpg
from asyncpg.rkt import set_query_dtype
import numpy as np

async def run():
    conn = await asyncpg.connect(user='user', password='password',
                                 database='database', host='127.0.0.1')
    dtype = np.dtype([
        ("a", int),
        ("b", "datetime64[s]"),
    ])
    array, nulls = await conn.fetch(
        set_query_dtype('SELECT * FROM mytable WHERE id = $1', dtype),
        10,
    )
    await conn.close()

loop = asyncio.get_event_loop()
loop.run_until_complete(run())

License

asyncpg-πŸš€ is developed and distributed under the Apache 2.0 license, just like the original project.

About

A fast PostgreSQL database client library for Python/asyncio

License:Apache License 2.0


Languages

Language:Python 71.2%Language:Cython 25.5%Language:C 3.2%Language:Makefile 0.1%