Polygon ETL allows you to setup an ETL pipeline in Google Cloud Platform for ingesting Polygon blockchain data into BigQuery and Pub/Sub. It comes with CLI tools for exporting Polygon data into convenient formats like CSVs and relational databases.
-
The nodes are run in a Kubernetes cluster.
-
Airflow DAGs export and load Polygon data to BigQuery daily. Refer to Polygon ETL Airflow for deployment instructions.
-
Polygon data is polled periodically from the nodes and pushed to Google Pub/Sub. Refer to Polygon ETL Streaming for deployment instructions.
-
Polygon data is pulled from Pub/Sub, transformed and streamed to BigQuery. Refer to Polygon ETL Dataflow for deployment instructions.
-
Follow the instructions in Polygon ETL Airflow to deploy a Cloud Composer cluster for exporting and loading historical Polygon data. It may take several days for the export DAG to catch up. During this time "load" and "verify_streaming" DAGs will fail.
-
Follow the instructions in Polygon ETL Streaming to deploy the Streamer component. For the value in
last_synced_block.txt
specify the last block number of the previous day. You can query it in BigQuery:SELECT number FROM crypto_polygon.blocks ORDER BY number DESC LIMIT 1
. -
Follow the instructions in Polygon ETL Dataflow to deploy the Dataflow component. Monitor "verify_streaming" DAG in Airflow console, once the Dataflow job catches up the latest block, the DAG will succeed.