EVM Transaction Indexer

Table of Contents

About the Project
Notes
Getting Started
- Prerequisites
- Usage
  - Run Test
System Design
Project Layout
Next Steps
Guides
Appendix

About the project

The purpose of the project is to create a robust API server and Interactive Web application to track transaction fee history in the Uniswap V3 USDC/ETH pool, specifically focusing on recording the transaction fee in USDT when each transaction is confirmed on the blockchain.

It supports real-time data recording and historical batch data recording, allowing continuous capture of live data and retrieval of past transactions. The system provides extensive RESTful APIs that support horizontal/vertical filters, pagination, ordering, and composite queries.

Additionally, It provides a client application that allows users to query transactions by ID/hash and time range, displaying a paginated list of transactions with customizable pagination options.

Notes

To prevent the limitation of external APIs and extends the usability, This project has pre-indexed transaction swap event data of Uniswap V3 USDC/ETH pool from block 12376729 (contract deployment block) on the Ethereum mainnet. To start quickly you can either import provided .csv file from here to your database or start running the service from your own block number by running the indexer flow with the start_block parameter (default initial block was set to 12376729) see Appendix Section for more details. If the default initial block was used, this can take up to 40 mins to reach the latest block according to the Etherscan API rate limit.

API documentation

For the full API document please check out this section

Getting Started

To start using this application locally, you need to clone this repository along with the submodule repository where the web client application is located.

git clone --recurse-submodules git@github.com:marktrs/tx-stream.git

This project is using Git submodule to allow managing dependencies in a larger Git repository, referencing other repositories as subdirectories while maintaining separate version control and revision history for each.

Prerequisites

To build using docker:

Docker Docker

To build from the source without docker:

Installed Python 3.11.4 for API server testing, building from source.
Installed Node.JS 18 for NextJS client application building on local.
Installed Postgres with a user configuration from the environment variable file

Usage

Using docker compose

Create a .env file from .env.example and fill in the environment variables such as ETHERSCAN_API_KEY

$ cp .env.example .env

Start all services

$ make start-server

This will start following services:

- PostgreSQL: Persistence data store
- PostgREST: RESTful API web server for PostgreSQL
- Prefect server & Prefect Agent: Dataflow automation and worker
- NextJS: Client application

Checkout Prefect Server Dashboard on http://localhost:4200/flow-runs/

This pre-configured dashboard will allow you to:

Start a new flow run with parameters (e.g. index new event topic on the different pool) concurrently.
Monitor flow run history status and metrics to identify bottlenecks and optimize performance.
Retry failed flow run and debugging with logs

Then navigate to the web application on http://localhost:8080/ to try out the web application dashboard

Stop all services

This command will remove all running containers, local images, and volumes

$ make stop-server

Run Test

Run unit testing from the source

make test

System Design

This project is designed for scalability and extensibility. It allows you to add new data sources and processing steps as needed. For example, with the current implementation, you can add:

New DEX and Pool by adding a new flow deployment with a different contract address.
New event topic by adding new flow deployment with different event topics.

In terms of reliability and availability, the system is designed to be highly available and scalable. It uses a distributed task execution where workflows in Prefect are standalone objects that can be run at any time. Fault-tolerant scheduling is a crucial feature in Prefect that ensures the reliability and availability of data pipelines, especially in production environments.

For more detail, The following section will explain the use case of 3 main components: API server, Dataflow automation, and client application and documentation of the API server.

API Server - PostgREST

Using PostgREST eliminating the need for custom API servers and object-relational mapping. It establishes a single source of truth by putting the data itself at the center. With declarative programming, PostgreSQL handles data joins and permissions effortlessly, reducing the complexity of coding. It offers a leak-proof abstraction, bypassing the need for ORMs and allowing efficient API creation using SQL. In short, it streamlines database-centric operations and empowers administrators to build APIs efficiently.

Dataflow Automation - Prefect

To establish efficient data pipelines, a Workflow management tool with a web-based UI and API is required. It must support parallel task execution, flexible task definition, and error handling. With a graphical interface, it must provide visibility into pipeline workflows and offers fault-tolerant scheduling. Prefect enables dynamic and parameterized workflows, task caching for faster development, and includes robust error-handling features. It's a reliable and adaptable choice for managing workflows efficiently.

Client Application - NextJS

This project requires a framework that supports client-side and server-side rendering, optimized with static and dynamic rendering. NextJS simplifies data fetching with async/await support and aligns with React and the Web Platform. With enhanced TypeScript support, including improved type checking and efficient compilation, it provides a seamless development experience. NextJS provides additional structure, features, and optimizations for your application, abstracting and configuring toolings like bundling and compiling, allowing you to focus on building your app without the hassle of setup.

API Documentation - Open API

This project uses the OpenAPI specification to document the API server's endpoints and responses to provide a clear and standardized way of describing the API, Enables the automatic generation of documentation and client libraries, which can save significant development time and effort and provides a machine-readable format that can be used for automated testing.

Project Layout

.
├── apidoc
│   └── ...
├── client
│   └── ...
├── docker
│   └── data
├── flows
│   ├── indexer
│   │   └── ...
│   └── ...
├── migrations
│   └── init.sql
├── tests
│   └── ...
├── Dockerfile
├── Makefile
├── README.md
├── docker-compose.yaml
├── pyproject.toml
├── requirements.txt
└── entrypoint.sh

apidoc: Contains OpenAPI specification and Swagger JSON files

client: Contains the client-side code or files, including frontend code, static assets, or other components related to the user interface.

docker: Docker container volume data

flows: Contains Prefect workflow, modeled as a Python function.

flows/indexer Contains python modules (etherscan.py, event_parser.py, scanner.py, store.py, and utils.py) responsible for indexing and processing data from external sources.

migrations: Contains SQL migration files for initializing database schema and structure.

tests: Unit testing for flows runner and related modules.

root directory: Contains miscellaneous files, including Makefile, Dockerfile, and shell scripts for deployment

Next Steps

Caching
- Application Caching - Route base caching on client-server to reduce the number of requests to the database and improve the performance of the application.
- PostgREST API Server Schema Cache - Some PostgREST features need metadata from the database schema. Getting this metadata requires expensive queries. To avoid repeating this work, PostgREST uses a schema cache.
- Database caching - PostgreSQL has a built-in caching mechanism that caches data in memory. It is called the shared buffer cache, and it is managed by the PostgreSQL buffer manager. The buffer manager is responsible for reading data from the disk into memory, writing changed data back to disk, and maintaining the integrity of the in-memory data.
Testing More coverage test beside core functionalities
gRPC to provide efficient binary protocol, which can result in faster and more efficient communication between client and server
Event decoding Decode transaction event data to get more information about the transaction

Guides

Deploy new event topic indexer

You can create a new flow run with a parameter to index a new event topic on a different pool concurrently by following these steps:

Using Prefect Server Dashboard UI

Navigate to (http://localhost:4200/flows)[http://localhost:4200/flows] and locate to deploy_symbol_scanner flow
Click on : > Quick run button and fill in the parameter > Click Run button
The new deployment will be added to queue. You can monitor the flow run status on the Flow Runs tab
See available deployment configurations

Using Prefect CLI

// attach to prefect-agent container
$ docker exec -it prefect-agent bash

// edit the following parameters to match your needs, see .env.example file for reference
$ export SYMBOLS=
$ export POOL_CONTRACT=
$ export INITIAL_BLOCK=
$ export BLOCK_RANGE=
$ export EVENT_TOPIC=
$ export RESULT_OFFSET=
$ export POLLING_INTERVAL_SEC=


// Create a new deployment for the above symbol / pool / event topic
$ prefect deployment build flows/tx_scan_deployer.py:tx_scan_deployer -n deploy-symbol-scanner --apply --params "{\"symbols\":\"$SYMBOLS\",\"pool_addr\":\"$POOL_CONTRACT\",\"initial_block\":$INITIAL_BLOCK,\"block_range\":$BLOCK_RANGE,\"event_topic\":\"$EVENT_TOPIC\",\"result_offset\":$RESULT_OFFSET, \"polling_interval_sec\":$POLLING_INTERVAL_SEC}"

//Run the new deployment
$ prefect deployment run tx_scan_deployer/deploy-symbol-scanner

Set concurrency limit

Using Prefect Server Dashboard UI

Navigate to (http://localhost:4200/concurrency-limits)[http://localhost:4200/concurrency-limits] and locate to Task Run Concurrency Limits concurrency limit
Click on + > fill in the parameter > Click Add button
The new concurrency limit will be set. New task run will use this value as limit
See available concurrency limit configurations

Using Prefect CLI

// Attach to prefect-agent container
$ docker exec -it prefect-agent bash

// Edit the following parameters to match your needs, see .env.example file for reference
$ export ETHERSCAN_RATE_LIMIT=

// Set a new concurrency limit
$ prefect concurrency-limit create etherscan_rate_limit $ETHERSCAN_RATE_LIMIT

trsmarc / tx-stream