kadena-io / chainweb-data

Data ingestion for Chainweb.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Chainweb Data

https://github.com/kadena-io/chainweb-data/actions/workflows/ci.yaml/badge.svg https://github.com/kadena-io/chainweb-data/actions/workflows/nix.yml/badge.svg

Table of Contents

Overview

chainweb-data stores and serves data from the Kadena Public Blockchain in a form optimized for lookups and analysis by humans. With this reindexed data we can easily determine mining statistics and confirm transaction contents.

Requirements

chainweb-data requires Postgres configured with UTF8 encoding. If you plan to host a chainweb-data instance on a cloud machine (e.g. Amazon EC2), we recommend that you run the postgres instance on an instance attached storage unit. The chainweb-node instance should have good performance to avoid timeouts during fill operations, use ssd storage for the db.

Building

chainweb-data can be built with either cabal, stack, or Nix. Building with nix is the most predictable because it is a single build command once you’ve installed Nix. This process will go significantly faster if you set up the Kadena nix cache as described here. Building with cabal or stack will probably require a little more knowledge of the Haskell ecosystem.

git clone https://github.com/kadena-io/chainweb-data
cd chainweb-data
nix-build

Usage

Connecting to the Database

By default, chainweb-data will attempt to connect to Postgres via the following values:

FieldValue
Hostlocalhost
Port5432
Userpostgres
PassEmpty
DB Namepostgres

You can alter these defaults via command line flags, or via a Postgres Connection String.

via Flags

Assuming you had set up Postgres, done a createdb chainweb-data, and had configured user permissions for a user named joe, the following would connect to a local Postgres database at port 5432:

chainweb-data <command> --service-host=<node> --dbuser=joe --dbname=chainweb-data

via a Postgres Connection String

chainweb-data <command> --service-host=<node> --dbstring="host=localhost port=5432..."

Connecting to a Node

chainweb-data syncs its data from a running chainweb-node. The node’s Service address is specified with the --service-host command. If a custom service enpoint port is used, you can specify it with --service-port

chainweb-data <command> --service-host=foo.chainweb.com ...

Configuring the Node

chainweb-data also needs some special node configuration. The server command needs headerStream and some of the other stats and information made available requires allowReadsInLocal. Increasing the throttling settings on the node also makes the backfill and gaps operations dramatically faster.

chainweb:
  allowReadsInLocal: true
  headerStream: true
  throttling:
    global: 1000

You can find an example node config in this repository in [node-config-for-chainweb-data.yaml](node-config-for-chainweb-data.yaml).

How to run chainweb-data

When running chainweb-data for the first time you should run chainweb-data server -m (with the necessary DB and node options of course). This will create the database and start filling the database with blocks. Wait a couple minutes, then run chainweb-data fill (again with the necessary options). After the fill operation finishes, you can run server again with the -f option and it will automatically run fill once a day to populate the DB with missing blocks.

Commands

listen

listen fetches live data from a chainweb-node whose headerStream configuration value is true.

> chainweb-data listen --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data
DB Tables Initialized
28911337084492566901513774

As a new block comes in, its chain number is printed as a single digit. listen will continue until you stop it.

server

server is just like listen, but also runs an HTTP server that serves a few endpoints for doing common queries. Additionally, it can serve an OpenAPI v3 spec of the API when the hidden --serve-swagger-ui option is enabled, offering a basic interface for interacting with the API. This feature, however, is kept unofficial for now due to its rudimentary documentation.

By specifying the optional --no-listen argument, the server can be made read-only, allowing multiple servers to serve from the same database.

Endpoints

  • /txs/recent gets a list of recent transactions
  • /txs/search?search=foo&limit=20&offset=40&minheight=100&maxheight=200 searches for transactions containing the string foo or the provided transaction pact id, with the additional option to filter results based on block height.
  • /txs/tx?requestkey=<request-key> gets the details of a transaction with the given request key
  • /txs/txs?requestkey=<request-key> same as txs, but returns a list of transactions, which allows the client to handle multiple appearances due to orphans.
  • /txs/events?search=foo&limit=20&offset=40&minheight=100&maxheight=200 searches for transaction events containing the string foo, and allows for results to be filtered by block height. It also offers pagination with limit and offset parameters.
  • /stats returns a few stats such as transaction count and coins in circulation
  • /coins returns just the coins in circulation
  • /txs/account/<account-identifier>?token=coin&chainid=12&minheight=100&maxheight=200&limit=20&offset=40 provides transactions related to the specified account identifier. It includes additional options to filter results based on the token name, chain ID, and block height, as well as pagination controls via limit and offset parameters.

For more detailed information, see the API definition here.

Note about partial search results

All of chainweb-data’s search endpoints (/txs/{events,search,account}) support a common workflow for efficiently retrieving the results of a given search in non-overlapping batches.

A request to any one of these endpoints that match more rows than the number asked with the limit query parameter will respond with a Chainweb-Next response header containing a token. That token can be used to call the same endpoint with the same query parameters plus the token passed in via the next query parameter in order to retreive the next batch of results.

chainweb-data supports a Chainweb-Execution-Strategy request header that can be used (probably by chainweb-data operators by setting it in the API gateway) to enable an upper bound on the amount of time the server will spend for searching results. Normally, the search endpoints will produce the given limit-many results if the search matches at least that many entries. However, if Chainweb-Execution-Strategy: Bounded is passed in, the response can contain less than limit rows even though there are potentially more matches, if those matches aren’t found quickly enough. In such a case, the returned Chainweb-Next token will act as a cursor for the search, so it’s possible to keep searching by making successive calls with subsequent Chainweb-Next tokens.

fill

fill fills in missing blocks. This command used to be called gaps but it has been improved to encompass all block filling operations.

> chainweb-data fill --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data

backfill

Deprecated: The backfill command is deprecated and will be removed in future releases. Use the fill command instead.

backfill rapidly fills the database downward from the lowest block height it can find for each chain.

Note: If your database is empty, you must fetch at least one block for each chain first via listen before doing backfill! If backfill detects any empty chains, it won’t proceed.

> chainweb-data backfill --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data
DB Tables Initialized
Backfilling...
[INFO] Processed blocks: 1000. Progress sample: Chain 9, Height 361720
[INFO] Processed blocks: 2000. Progress sample: Chain 4, Height 361670

backfill will stop when it reaches height 0.

backfill-transfers

backfill-transfers fills entries in the transfers table from the highest block height it can find for each chain up until the height that events for coinbase transfers began to exist.

Note: If the transfers table is empty, you must fetch at least one row for each chain first via listen before doing backfill-transfers! If backfill-transfers detects any empty chains, it won’t proceed.

gaps

Deprecated: The backfill command is deprecated and will be removed in future releases. Use the fill command instead.

gaps fills in missing blocks that may have been missed during listen or backfill. Such gaps will naturally occur if you turn listen off or use single.

> chainweb-data gaps --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data
DB Tables Initialized
[INFO] Processed blocks: 1000. Progress sample: Chain 9, Height 361624
[INFO] Processed blocks: 2000. Progress sample: Chain 9, Height 362938
[INFO] Filled in 2113 missing blocks.

single

single allows you to sync a block at any location in the blockchain.

> chainweb-data single --chain=0 --height=200 --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data
DB Tables Initialized
[INFO] Filled in 1 blocks.

Note: Even though you specified a single chain/height pair, you might see it report that it filled in more than one block. This is expected, and will occur when orphans/forks are present at that height.

migrate

migrate allows you to migrate the database schema to the latest version and exit. This can be useful for separating the migration step from running the ETL and/or HTTP service.

> chainweb-data migrate --dbuser=joe --dbname=chainweb-data

check-schema

check-schema is used to perform a check of the ORM definitions against the DB schema.

> chainweb-data check-schema --service-host=foo.chainweb.com --dbuser=joe --dbname=chainweb-data

Specializing the Database Schema

A common use case for chainweb-data is to primarily run it as a worker process to populate a Postgres database with blockchain data. In this case, chainweb-data operators often want to run their schema migrations and modify the schema according to their needs. Obviously, by introducing arbitrary schema changes, we can not guarantee unimpeded operation of chainweb-data. Any node operator that wishes to modify the database, takes on the responsibility of ensuring that their changes do not interfere with the current operation of chainweb-data. A node operator is also responsible for considering their changes in the face of future releases of chainweb-data.

chainweb-data provides a way to help with this process. Any version of chainweb-data comes with a set of schema migrations included in the binary that are applied by default to the database at migration time. These migrations are defined in the haskell-src/db-schema/migrations directory. It is possible to override these migrations by calling chainweb-data with the optional --migrations-folder argument. However, in order to add migrations to the default set, an --extra-migrations-folder argument is provided.

The default migrations that come with chainweb-data have the following file name format: X.Y.Z.N_NAME.sql. Here X.Y.Z is the version of chainweb-data after which the migration was introduced. N is the migration number. These migrations are executed in “alphabetical order” considering X,Y,Z and N to be the elements by which they are sorted. The migration procedure will fetch the already executed migrations from the database and check them against the migrations provided through the --extra-migrations-folder and the --migrations-folder arguments. If the already executed migrations are a prefix (i.e. they were run in the correct order and have no gaps or extras) of the expected migrations, then the rest of the migrations will be executed.

By taking advantage of this alphabetical sorting, chainweb-data operators can insert custom migrations to be executed at the moment they desire. For example, version 2.3.0 of chainweb-data will have migrations done on top of version 2.2.0, thus having migrations named 2.2.0.N_... (N>=1). Creating a custom migration named 2.3.0.0.N_... will guarantee that it’ll be executed after the new migrations that come with version 2.3.0 and before the new migrations of future versions, which are guaranteed to have a name greater than 2.3.0.1_.... Likewise, chainweb-data operators that run the latest commit from the master branch can also inject their migrations. For example, if the latest commit has the last migration named 2.2.0.1_..., then their migrations can be named 2.2.0.1.N_....

It’s important to note, that running chainweb-data from an unreleased commit of the master branch is **not officially supported** and even though we aim to avoid it, we can change new migrations of the master branch without notice, so you may have to fix your database manually by undoing migrations and removing schema_migrations entries.

chainweb-data operators that specialize their database schema are strongly advised to review the incoming migrations **before** they upgrade their chainweb-data versions. This will allow them to detect any potential conflicts and insert new schema migrations to be executed at the right moment, to accommodate the incoming changes.

About

Data ingestion for Chainweb.

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Haskell 97.8%Language:Nix 2.2%