NEAR Indexer for Explorer

NEAR Indexer for Explorer is built on top of NEAR Indexer microframework to watch the network and store all the events in the PostgreSQL database.

Shared Public Access

NEAR runs the indexer and maintains it for NEAR Explorer, NEAR Wallet, and some other internal services. It proved to be a great source of data for various analysis and services, so we decided to give a shared read-only public access to the data:

testnet credentials: postgres://public_readonly:nearprotocol@testnet.db.explorer.indexer.near.dev/testnet_explorer
mainnet credentials: postgres://public_readonly:nearprotocol@mainnet.db.explorer.indexer.near.dev/mainnet_explorer

WARNING: We may evolve the data schemas, so make sure you follow the release notes of this repository.

NOTE: Please, keep in mind that the access to the database is shared across everyone in the world, so it is better to make sure you limit the amount of queries and individual queries are efficient.

Self-hosting

The final setup consists of the following components:

PostgreSQL database (you can run it locally or in the cloud), which can hold the whole history of the blockchain (as of January 2022, mainnet takes 1.1TB of data in PostgreSQL storage, and testnet takes 420GB)
NEAR Indexer for Explorer binary that operates as a regular NEAR Protocol peer-to-peer node, so you will operate it as any other Archival Node in NEAR

Prepare Development Environment

Before you proceed, make sure you have the following software installed:

Rust compiler of the version that is mentioned in rust-toolchain file in the root of nearcore project.
libpq-dev dependency

On Debian/Ubuntu:
```
$ sudo apt install libpq-dev
```

Prepare Database

Setup PostgreSQL database, create a database with the regular tools, and note the connection string (database host, credentials, and the database name).

Clone this repository and open the project folder

$ git clone git@github.com:near/near-indexer-for-explorer.git
$ cd near-indexer-for-explorer

You need to provide database credentials in .env file like below (replace user, password, host and db_name with yours):

$ echo "DATABASE_URL=postgres://user:password@host/db_name" > .env

Then you need to apply migrations to create necessary database structure. For this you'll need diesel-cli, you can install it like so:

$ cargo install diesel_cli --no-default-features --features "postgres"

And apply migrations

$ diesel migration run

Compile NEAR Indexer for Explorer

$ cargo build --release

Configure NEAR Indexer for Explorer

To connect NEAR Indexer for Explorer to the specific chain you need to have necessary configs, you can generate it as follows:

$ ./target/release/near-indexer --home-dir ~/.near/testnet init --chain-id testnet --download-config --download-genesis

The above code will download the official genesis config and generate necessary configs. You can replace testnet in the command above to different network ID (betanet, mainnet).

These are the default config files that one could use just for the reference:

Configs for the specified network are in the --home-dir provided folder. We need to ensure that NEAR Indexer for Explorer follows all the necessary shards, so "tracked_shards" parameters in ~/.near/testnet/config.json needs to be configured properly. For example, with a single shared network, you just add the shard #0 to the list:

...
"tracked_shards": [0],
...

Run NEAR Indexer for Explorer

Command to run NEAR Indexer for Explorer have to contain sync mode.

You can choose NEAR Indexer for Explorer sync mode by setting what to stream:

sync-from-latest - start indexing blocks from the latest finalized block
sync-from-interruption --delta <number_of_blocks> - start indexing blocks from the block NEAR Indexer was interrupted last time but earlier for <number_of_blocks> if provided
sync-from-block --height <block_height> - start indexing blocks from the specific block height

Optionally you can tell Indexer to store data from genesis (Accounts and Access Keys) by adding key --store-genesis to the run command.

NEAR Indexer for Explorer works in strict mode by default, but you can disable it for specific amount of blocks. The strict mode means that every piece of data will be retried to store to database in case of error. Errors may occur when the parent piece of data is still processed but the child piece is already trying to be stored. So Indexer keeps retrying to store the data until success. However if you're running Indexer not from the genesis it is possible that you really miss some of parent data and it'll be impossible to store child one, so you can disable strict mode for 1000 blocks to ensure you've passed the strong relation data area and you're running Indexer where it is impossible to loose any piece of data.

To disable strict mode you need to provide:

--non-strict-mode

Sometimes you may want to index block while sync process is happening, by default an indexer node is waiting for full sync to complete but you can enable indexing while the node is syncing by passing --stream-while-syncing

By default NEAR Indexer for Explorer processes only a single block at a time. You can adjust this with the --concurrency argument (when the blocks are mostly empty, it is fine to go with as many as 100 blocks of concurrency).

So final command to run NEAR Indexer for Explorer can look like:

$ cargo run --release -- --home-dir ~/.near/testnet run --store-genesis --stream-while-syncing --non-strict-mode --concurrency 1 sync-from-latest

After the network is synced, you should see logs of every block height currently received by NEAR Indexer for Explorer.

Troubleshoot NEAR Indexer for Explorer

Refer to a separate TROBLESHOOTING.md document.

Database structure

Creating read-only PostgreSQL user

We highly recommend using a separate read-only user to access the data to avoid unexcepted corruption of the indexed data.

We use public schema for all tables. By default, new users have the possibility to create new tables/views/etc there. If you want to restrict that, you have to revoke these rights:

REVOKE CREATE ON SCHEMA PUBLIC FROM PUBLIC;
REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA PUBLIC FROM PUBLIC;
ALTER DEFAULT PRIVILEGES IN SCHEMA PUBLIC GRANT SELECT ON TABLES TO PUBLIC;

After that, you could create read-only user in PostgreSQL:

CREATE ROLE readonly;
GRANT USAGE ON SCHEMA public TO readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public to readonly;
-- Put here your limit or just ignore this command
ALTER ROLE readonly SET statement_timeout = '30s';

CREATE USER explorer with login password 'password';
GRANT readonly TO explorer;

$ PGPASSWORD="password" psql -h 127.0.0.1 -U explorer databasename

Syncing

Whenever you run NEAR Indexer for Explorer for any network except localnet you'll need to sync with the network. This is required because it's a natural behavior of nearcore node and NEAR Indexer for Explorer is a wrapper for the regular nearcore node. In order to work and index the data your node must be synced with the network. This process can take a while, so we suggest to download a fresh backup of the data folder and put it in you --home-dir of your choice (by default it is ~/.near)

Running your NEAR Indexer for Explorer node on top of a backup data will reduce the time of syncing process because your node will download only missing data and it will take reasonable time.

All the backups can be downloaded from the public S3 bucket which contains latest daily snapshots following the instruction here.

Running NEAR Indexer for Explorer as archival node

It's not necessary but in order to index everything in the network it is better to do it from the genesis. nearcore node is running in non-archival mode by default. That means that the node keeps data only for 5 last epochs. In order to index data from the genesis we need to turn the node in archival mode.

To do it we need to update config.json located in --home-dir or your choice (by default it is ~/.near).

Find next keys in the config and update them as following:

{
  ...
  "archive": true,
  "tracked_shards": [0],
  ...
}

The syncing process in archival mode can take a lot of time, so it's better to download a backup provided by NEAR and put it in your data folder. After that your node will need to sync only missing data and it should take reasonable time.

All the backups can be downloaded from the public S3 bucket which contains latest daily snapshots following the instruction here.

See this link for reference

Local debugging

If you want to play with the code locally, it's better not to copy existing mainnet/testnet (it requires LOTS of memory), but to have your own small example. You need to have empty DB (we suggest to use Docker for that). Go through steps above until (including) diesel migration. Then,

$ cargo run --release -- --home-dir ~/.near/localnet init --chain-id localnet

Edit ~/.near/localnet/config.json by adding tracking shards and archiving option (see example above).

$ cargo run -- --home-dir ~/.near/localnet run --store-genesis sync-from-latest

Congrats, the blocks are being produced right now! There should be some lines in the DB. Now, we need to generate some activity to add new examples.

$ npm i -g near-cli
$ NEAR_ENV=local near create-account awesome.test.near --initialBalance 30 --masterAccount test.near --keyPath=~/.near/localnet/validator_key.json
$ NEAR_ENV=local near send test.near awesome.test.near 5

All available commands are here.

You can stop and re-run the example at any time. Blocks will continue producing from the last state.

Troubleshooting

When operating normally, you should see "INFO indexer_for_explorer: Block height ..." messages in the logs.

The node is fully synced and running, but no indexer messages and no transactions in the database (not indexing)

Make sure the blocks you want to save exist on the node. Check them via JSON RPC:

curl http://127.0.0.1:3030/ -X POST --header 'Content-type: application/json' --data '{"jsonrpc": "2.0", "id": "dontcare", "method": "block", "params": {"block_id": 9820214}}'

NOTE: Block #9820214 is the first block after genesis block (#9820210) on Mainnet.

If it returns an error that the block does not exist or missing, it means that your node does not have the necessary data. Your options here are to start from the blocks that are recorded on the node or start an archival node (see above) and make sure you have the full network history (either use a backup or let the node sync from scratch (it is quite slow, so backup is recommended))

sigridjineth / near-indexer-for-explorer