IMPORTANT: Nancy has migrated to GitLab.com. The official repository address: https://gitlab.com/postgres.ai/nancy

About

Nancy helps to conduct automated database experiments.

The Nancy Command Line Interface is a unified way to manage automated database experiments either in clouds or on-premise.

What is a Database Experiment?

Database experiment is a set of actions performed to test

(a) specified SQL queries ("workload")
(b) on specified machine / OS / Postgres version ("environment")
(c) against specified database ("object")
(d) with an optional change – some DDL or config change ("target" or "delta").

Two main goals for any database experiment:

(1) validation – check that the specified workload is valid,
(2) benchmark – perform deep SQL query analysis.

Database experiments are needed when you:

add or remove indexes;
for a new DB schema change, want to validate it and estimate migration time;
want to verify some query optimization ideas;
tune database configuration parameters;
do capacity planning and want to stress-test your DB in some environment;
plan to upgrade your DBMS to a new major version;
want to train ML model related to DB optimization.

Currently Supported Features

Works anywhere where Docker can run (checked: Linux Ubuntu/Debian, macOS)
Experiments are conducted in a Docker container with extended Postgres setup
Supported Postgres versions: 11 (default), 10, 9.6
Postgres config specified via options, may be partial
Supported locations for experimental runs:
- Any machine with Docker installed
- AWS EC2:
  - Run on AWS EC2 Spot Instances (using Docker Machine)
  - Allow to specify EC2 instance type
  - Auto-detect and use current lowest EC2 Spot Instance prices
  - Support i3 instances (with NVMe SSD drives)
  - Support arbitrary-size EBS volumes
Support local or remote (S3) files – config, dump, etc
The object (database) can be specified in various ways:
- Plain text
- Synthetic database generated by pgbench
- Dump file (.sql, .gz, .bz2)
What to test (a.k.a. "target" or "delta"):
- Test Postgres parameters change
- Test DDL change (specified as "do" and "undo" SQL to return state)
Supported types of workload:
- Any custom SQL
- Synthetic workload generated by pgbench
- "Real workload" based on Postgres logs (using pgreplay)
For "real workload", allow replaying it with increased speed
Allow to keep container alive for specified time after all steps are done
Collected artifacts:
- pg_stat_statements snapshot
- pg_stat_database, ...
- Workload SQL logs
- Deep SQL query analysis report

Requirements

To use Nancy CLI you need Linux or MacOS with installed Docker.
To run on AWS EC2 instances, you also need:

AWS CLI https://aws.amazon.com/en/cli/
Docker Machine https://docs.docker.com/machine/
jq https://stedolan.github.io/jq/

Installation

In the minimal configuration, only a few steps are needed:

NOTICE: The Additional notes section contains instructions useful in case of docker-related errors during nancy run calls. Alternatively, see Docker's official post-installation instructions for Linux.

Install Docker

Ubuntu/Debian:

sudo apt-get -y install docker
sudo systemctl enable docker
sudo systemctl start docker

RHEL7:

yum -y install docker
systemctl enable docker
systemctl start docker

MacOS (assuming that Homebrew is installed):

brew install docker

Clone this repo and adjust $PATH:

git clone https://gitlab.com/postgres.ai/nancy.git
echo "export PATH=\$PATH:"$(pwd)"/nancy" >> ~/.bashrc
source ~/.bashrc

Install jq

Ubuntu/Debian: sudo apt-get -y install jq
CentOS/RHEL: sudo yum install jq
MacOS: brew install jq

Additionally, to allow use of AWS EC2 instances:

Install AWS CLI https://docs.aws.amazon.com/cli/latest/userguide/installing.html
Install Docker Machine tools https://docs.docker.com/machine/install-machine/

Getting started

Start with these commands:

nancy help
nancy run help

"Hello World!"

Locally, on any Linux or macOS machine:

echo "create table hello_world as select i from generate_series(1, (10^6)::int) _(i);" \
  | bzip2 > ./sample.dump.bz2

# "Clean run": w/o index
# (seqscan is expected, total time ~150ms, depending on resources)
nancy run \
  --db-dump file://$(pwd)/sample.dump.bz2 \
  --workload-custom-sql "select i from hello_world where i between 10 and 20;"

# Now check how a regular btree index affects performance
# (expected total time: ~0.05ms)
nancy run \
  --db-dump file://$(pwd)/sample.dump.bz2 \
  --workload-custom-sql "select i from hello_world where i between 10 and 20;" \
  --delta-sql-do "create index i_hello_world_i on hello_world(i);" \
  --delta-sql-undo "drop index i_hello_world_i;"

AWS EC2:

nancy run \
  --run-on aws \
  --aws-ec2-type "i3.large" \
  --aws-keypair-name awskey \
  --aws-ssh-key-path file://$(echo ~)/.ssh/awskey.pem  \
  --db-dump "create table hello_world as select i from generate_series(1, (10^6)::int) _(i);" \
  --workload-custom-sql "select i from hello_world where i between 10 and 20;"

Additional notes

On Linux, if you experience issues with running (locally) nancy run inside screen or tmux, double-check that Docker is running and add your user to the docker group, as described below. See also: https://docs.docker.com/install/linux/linux-postinstall/.

Ubuntu/Debian:

# Ubuntu/Debian
sudo usermod -aG docker ${USER}
newgrp docker

CentOS/RHEL:

sudo usermod -aG dockerroot ${USER}
newgrp dockerroot

On MacOS, it is recommended to specify --tmp-path explicitly, similar to this:

mkdir ./tmp
nancy run ... --tmp-path "$(pwd)/tmp"

askrana / nancy

About

What is a Database Experiment?

Currently Supported Features

Requirements

Installation

Getting started

"Hello World!"

Additional notes

About

Languages