pg_analytics
has been succeeded by pg_lakehouse
. Whereas pg_analytics
is designed to query over data stored within Postgres, pg_lakehouse
queries over external data lakes like S3 and table formats like Iceberg, storing no data inside Postgres. Because we are strong believers of separation of storage and compute and have seen greater demand for pg_lakehouse
, we're choosing to focus our development efforts on pg_lakehouse
for the next few months.
As a result, pg_analytics
has been moved out of the main ParadeDB repo. It is possible that development on pg_analytics
will resume in the future.
pg_analytics
is an extension that accelerates analytical query processing inside Postgres. The performance of analytical queries that leverage pg_analytics
is comparable to the performance of dedicated OLAP databases — without the need to extract, transform, and load (ETL) the data from your Postgres instance into another system. The purpose of pg_analytics
is to be a drop-in solution for fast analytics in Postgres with zero ETL.
The primary dependencies are:
- Apache Arrow for column-oriented memory format
- Apache DataFusion for vectorized query execution with SIMD
- Apache Parquet for persistence
- Delta Lake as a storage framework with ACID properties
- pgrx, the framework for creating Postgres extensions in Rust
These libraries are the building blocks of many modern analytical databases and enable column-oriented storage, efficient data compression, and vectorized query execution within Postgres. Please see our blog post for a deep dive into how it works.
With pg_analytics
installed, ParadeDB is the fastest Postgres-based analytical database and outperforms many specialized OLAP systems. On Clickbench, ParadeDB is 94x faster than regular Postgres, 8x faster than Elasticsearch, and almost ties Clickhouse.
For an apples-to-apples comparison, these benchmarks were run on a c6a.4xlarge with 500GB storage. None of the databases were tuned. The (Parquet, single) Clickhouse variant was selected because it most closely matches ParadeDB's Parquet storage.
You can view the ParadeDB ClickBench results against other Postgres-compatible OLAP databases here.
This toy example demonstrates how to get started.
CREATE EXTENSION pg_analytics;
-- Create a parquet table
CREATE TABLE t (a int) USING parquet;
-- pg_analytics supercharges the performance of any
-- Postgres query run on a parquet table
INSERT INTO t VALUES (1), (2), (3);
SELECT COUNT(*) FROM t;
You can interact with parquet
tables the same way as with normal Postgres tables. However, there are a few exceptions.
Because column-oriented storage formats are not designed for fast updates, parquet
tables are append only.
This means that parquet
tables do not support UPDATE
and DELETE
clauses. Data that is frequently updated
should be stored in regular Postgres heap
tables.
The VACUUM FULL <table_name>
command is used to optimize a table's storage by bin-packing small Parquet
files into larger files, which can significantly improve query time and compression.
pg_analytics
is currently in beta.
As pg_analytics
becomes production-ready, many of these will be resolved.
-
UPDATE
andDELETE
- Some Postgres types, notably
json
andtimestamptz
- Write-ahead log (WAL) support
- Collations
-
INSERT ... ON CONFLICT
clauses
To develop the extension, first install Rust v1.76.0 using rustup
. We will soon make the extension compatible with newer versions of Rust:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup install 1.76.0
# We recommend setting the default version to 1.76.0 for consistency across your system
rustup default 1.76.0
Note: While it is possible to install Rust via your package manager, we recommend using rustup
as we've observed inconcistencies with Homebrew's Rust installation on macOS.
Then, install the PostgreSQL version of your choice using your system package manager. Here we provide the commands for the default PostgreSQL version used by this project:
# macOS
brew install postgresql@16
# Ubuntu
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
sudo apt-get update && sudo apt-get install -y postgresql-16 postgresql-server-dev-16
If you are using Postgres.app to manage your macOS PostgreSQL, you'll need to add the pg_config
binary to your path before continuing:
export PATH="$PATH:/Applications/Postgres.app/Contents/Versions/latest/bin"
Then, install and initialize pgrx
:
# Note: Replace --pg16 with your version of Postgres, if different (i.e. --pg15, --pg14, etc.)
cargo install --locked cargo-pgrx --version 0.12.0-alpha.1
# macOS arm64
cargo pgrx init --pg16=/opt/homebrew/opt/postgresql@16/bin/pg_config
# macOS amd64
cargo pgrx init --pg16=/usr/local/opt/postgresql@16/bin/pg_config
# Ubuntu
cargo pgrx init --pg16=/usr/lib/postgresql/16/bin/pg_config
If you prefer to use a different version of Postgres, update the --pg
flag accordingly.
Note: While it is possible to develop using pgrx's own Postgres installation(s), via cargo pgrx init
without specifying a pg_config
path, we recommend using your system package manager's Postgres as we've observed inconsistent behaviours when using pgrx's.
This extension uses Postgres hooks to intercept Postgres queries. In order to enable these hooks, the extension
must be added to shared_preload_libraries
inside postgresql.conf
. If you are using Postgres 16, this file can be found under ~/.pgrx/data-16
.
# Inside postgresql.conf
shared_preload_libraries = 'pg_analytics'
The extension can be developed with or without an optimized build. An optimized build improves query times by 10-20x but also significantly increases build times.
To launch the extension without an optimized build, run
cargo pgrx run
First, switch to latest Rust Nightly (as of writing, 1.77) via:
rustup update nightly
rustup override set nightly
Then, reinstall pgrx
for the new version of Rust:
cargo install --locked cargo-pgrx --version 0.12.0-alpha.1 --force
Finally, run to build in release mode with SIMD:
cargo pgrx run --release
Note that this may take several minutes to execute.
To revert back to the stable version of Rust, run:
rustup override unset
We support two types of benchmarks: ClickBench and TPC-H. ClickBench is ideal for testing analytical queries, while TPC-H is ideal for testing JOINs. To run the benchmarks, cd into the benchmarks/clickbench
or benchmarks/tpch
directory and run ./benchmark.sh -t <flag>
. The -t
flag is the version to benchmark, either x.y.z
or latest
to pull a version from DockerHub, or local
to build the Docker image locally.