chainweb-data
stores and serves data from the Kadena Public Blockchain in a
form optimized for lookups and analysis by humans. With this reindexed data we
can easily determine mining statistics and confirm transaction contents.
chainweb-data
requires Postgres configured with UTF8 encoding. If you plan to host a chainweb-data instance
on a cloud machine (e.g. Amazon EC2), we recommend that you run the postgres
instance on an instance attached storage unit.
The chainweb-node instance should have good performance to avoid timeouts during fill operations, use ssd storage for the db.
chainweb-data
can be built with either cabal
, stack
, or
Nix. Building with nix
is the most
predictable because it is a single build command once you’ve installed Nix.
This process will go significantly faster if you set up the Kadena nix cache
as described
here.
Building with cabal
or stack
will probably require a little more knowledge
of the Haskell ecosystem.
git clone https://github.com/kadena-io/chainweb-data cd chainweb-data nix-build
By default, chainweb-data
will attempt to connect to Postgres via the
following values:
Field | Value |
---|---|
Host | localhost |
Port | 5432 |
User | postgres |
Pass | Empty |
DB Name | postgres |
You can alter these defaults via command line flags, or via a Postgres Connection String.
Assuming you had set up Postgres, done a createdb chainweb-data
, and had
configured user permissions for a user named joe
, the following would connect
to a local Postgres database at port 5432:
chainweb-data <command> --service-host=<node> --dbuser=joe --dbname=chainweb-data
chainweb-data <command> --service-host=<node> --dbstring="host=localhost port=5432..."
chainweb-data
syncs its data from a running chainweb-node
. The node’s
Service address is specified with the --service-host
command.
If a custom service enpoint port is used, you can specify it with --service-port
chainweb-data <command> --service-host=foo.chainweb.com ...
chainweb-data
also needs some special node configuration. The server
command needs headerStream
and some of the other stats and information made
available requires allowReadsInLocal
. Increasing the throttling settings on
the node also makes the backfill
and gaps
operations dramatically faster.
chainweb: allowReadsInLocal: true headerStream: true throttling: global: 1000
You can find an example node config in this repository in [node-config-for-chainweb-data.yaml](node-config-for-chainweb-data.yaml).
When running chainweb-data for the first time you should run chainweb-data
server -m
(with the necessary DB and node options of course). This will create
the database and start filling the database with blocks. Wait a couple minutes,
then run chainweb-data fill
(again with the necessary options). After the
fill
operation finishes, you can run server
again with the -f
option and it
will automatically run fill once a day to populate the DB with missing blocks.
listen
fetches live data from a chainweb-node
whose headerStream
configuration value is true
.
> chainweb-data listen --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data DB Tables Initialized 28911337084492566901513774
As a new block comes in, its chain number is printed as a single digit.
listen
will continue until you stop it.
server
is just like listen
, but also runs an HTTP server that serves a
few endpoints for doing common queries.
Additionally, it can serve an OpenAPI v3 spec of the API when the hidden
--serve-swagger-ui
option is enabled, offering a basic interface for interacting
with the API. This feature, however, is kept unofficial for now due to
its rudimentary documentation.
By specifying the optional --no-listen
argument, the server can be made read-only,
allowing multiple servers to serve from the same database.
/txs/recent
gets a list of recent transactions/txs/search?search=foo&limit=20&offset=40&minheight=100&maxheight=200
searches for transactions containing the stringfoo
or the provided transaction pact id, with the additional option to filter results based on block height./txs/tx?requestkey=<request-key>
gets the details of a transaction with the given request key/txs/txs?requestkey=<request-key>
same as txs, but returns a list of transactions, which allows the client to handle multiple appearances due to orphans./txs/events?search=foo&limit=20&offset=40&minheight=100&maxheight=200
searches for transaction events containing the stringfoo
, and allows for results to be filtered by block height. It also offers pagination with limit and offset parameters./stats
returns a few stats such as transaction count and coins in circulation/coins
returns just the coins in circulation/txs/account/<account-identifier>?token=coin&chainid=12&minheight=100&maxheight=200&limit=20&offset=40
provides transactions related to the specified account identifier. It includes additional options to filter results based on the token name, chain ID, and block height, as well as pagination controls via limit and offset parameters.
For more detailed information, see the API definition here.
All of chainweb-data
’s search endpoints (/txs/{events,search,account}
) support a common workflow
for efficiently retrieving the results of a given search in non-overlapping batches.
A request to any one of these endpoints that match more rows than the number asked with the limit
query parameter will respond with a Chainweb-Next
response header containing a token. That token
can be used to call the same endpoint with the same query parameters plus the token passed in via
the next
query parameter in order to retreive the next batch of results.
chainweb-data
supports a Chainweb-Execution-Strategy
request header that can be used (probably by
chainweb-data
operators by setting it in the API gateway) to enable
an upper bound on the amount of time the server will spend for searching results. Normally, the
search endpoints will produce the given limit
-many results if the search matches at least that many
entries. However, if Chainweb-Execution-Strategy: Bounded
is passed in, the response can contain
less than limit
rows even though there are potentially more matches, if those matches aren’t found
quickly enough. In such a case, the returned Chainweb-Next
token will act as a cursor for the search,
so it’s possible to keep searching by making successive calls with subsequent Chainweb-Next
tokens.
fill
fills in missing blocks. This command used to be called gaps
but it has
been improved to encompass all block filling operations.
> chainweb-data fill --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data
Deprecated: The backfill command is deprecated and will be removed in future
releases. Use the fill
command instead.
backfill
rapidly fills the database downward from the lowest block height it
can find for each chain.
Note: If your database is empty, you must fetch at least one block for each
chain first via listen
before doing backfill
! If backfill
detects any
empty chains, it won’t proceed.
> chainweb-data backfill --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data DB Tables Initialized Backfilling... [INFO] Processed blocks: 1000. Progress sample: Chain 9, Height 361720 [INFO] Processed blocks: 2000. Progress sample: Chain 4, Height 361670
backfill
will stop when it reaches height 0.
backfill-transfers
fills entries in the transfers table from the highest block
height it can find for each chain up until the height that events for coinbase
transfers began to exist.
Note: If the transfers table is empty, you must fetch at least one row for each
chain first via listen
before doing backfill-transfers
! If backfill-transfers
detects any
empty chains, it won’t proceed.
Deprecated: The backfill command is deprecated and will be removed in future
releases. Use the fill
command instead.
gaps
fills in missing blocks that may have been missed during listen
or
backfill
. Such gaps will naturally occur if you turn listen
off or use
single
.
> chainweb-data gaps --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data DB Tables Initialized [INFO] Processed blocks: 1000. Progress sample: Chain 9, Height 361624 [INFO] Processed blocks: 2000. Progress sample: Chain 9, Height 362938 [INFO] Filled in 2113 missing blocks.
single
allows you to sync a block at any location in the blockchain.
> chainweb-data single --chain=0 --height=200 --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data DB Tables Initialized [INFO] Filled in 1 blocks.
Note: Even though you specified a single chain/height pair, you might see it report that it filled in more than one block. This is expected, and will occur when orphans/forks are present at that height.
migrate
allows you to migrate the database schema to the latest version and exit.
This can be useful for separating the migration step from running the ETL and/or HTTP service.
> chainweb-data migrate --dbuser=joe --dbname=chainweb-data
check-schema
is used to perform a check of the ORM definitions against the DB schema.
> chainweb-data check-schema --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data
A common use case for chainweb-data
is to primarily run it as a worker process to
populate a Postgres database with blockchain data. In this case, chainweb-data
operators often want to run their schema migrations and modify the schema according
to their needs. Obviously, by introducing arbitrary schema changes, we can not guarantee unimpeded operation of chainweb-data
.
Any node operator that wishes to modify the
database, takes on the responsibility of ensuring that their changes
do not interfere with the current operation of chainweb-data
.
A node operator is also responsible for considering their changes in the face of future releases of chainweb-data
.
chainweb-data
provides a way to help with this process.
Any version of chainweb-data
comes with a set of schema migrations included in the
binary that are applied by default to the database at migration time. These migrations
are defined in the haskell-src/db-schema/migrations
directory. It is possible to override
these migrations by calling chainweb-data
with the optional --migrations-folder
argument.
However, in order to add migrations to the
default set, an --extra-migrations-folder
argument is provided.
The default migrations that come with chainweb-data
have the following file
name format: X.Y.Z.N_NAME.sql
. Here X.Y.Z
is the version of chainweb-data
after which the
migration was introduced. N
is the migration number. These migrations are executed
in “alphabetical order” considering X,Y,Z and N to be the elements by which they are sorted.
The migration procedure will fetch the already executed migrations from the database and check them against the migrations provided through the --extra-migrations-folder
and the --migrations-folder
arguments.
If the already executed migrations are a prefix (i.e. they were run in the correct order and have no gaps or extras)
of the expected migrations, then the rest of the migrations will be executed.
By taking advantage of this alphabetical sorting, chainweb-data
operators can insert custom migrations to be executed at the moment they desire.
For example, version 2.3.0 of chainweb-data
will have migrations done on top of version 2.2.0, thus having migrations named 2.2.0.N_...
(N>=1
). Creating a custom migration named 2.3.0.0.N_...
will guarantee that it’ll be executed after the new migrations that come with version 2.3.0 and before the new migrations of future versions, which are guaranteed to have a name greater than 2.3.0.1_...
.
Likewise, chainweb-data
operators that run the latest commit
from the master
branch can also inject their migrations. For example, if the latest commit
has the last migration named 2.2.0.1_...
, then their migrations can be named 2.2.0.1.N_...
.
It’s important to note, that running chainweb-data
from an unreleased commit of the
master
branch is **not officially supported** and even though we aim to avoid it, we can change
new migrations of the master
branch without notice, so you may have to fix your database
manually by undoing migrations and removing schema_migrations
entries.
chainweb-data
operators that specialize their database schema are strongly advised to review
the incoming migrations **before** they upgrade their chainweb-data
versions. This will allow
them to detect any potential conflicts and insert new schema migrations to be executed at the right moment, to
accommodate the incoming changes.