astronomer / airflow-duckdb-examples

A repository for examples on using Airflow with DuckDB.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Airflow DuckDB examples

Welcome 👋 !

This is a small toy repository for you to test different ways of connection Airflow, DuckDB and MotherDuck. 🦆

If you are new to Airflow consider checking out our quickstart repository and Get started tutorial.

How to use this repository

  1. Make sure you have Docker Desktop installed and running.

  2. Install the Astro CLI.

  3. Clone this repository.

  4. Create a .env file with the contents of the provided .env.example file. If you are using MotherDuck, provide your MotherDuck token.

  5. Start Airflow by running astro dev start.

  6. In the Airflow UI define the following Airflow connections:

    • my_local_duckdb_conn with the following parameters:
      • Conn Type: duckdb
      • File (leave blank for in-memory database): include/my_local_ducks
    • my_motherduck_conn with the following parameters:
      • Conn Type: duckdb
      • File (leave blank for in-memory database):

    You can double check your connection credentials using the include/test_script.py script. To run the script inside of the Airflow scheduler container run astro dev bash -s and then python include/test_script.py.

  7. Manually trigger DAGs by clicking the play button for each DAG on the right side of the screen.

DAGs

This repo contains 4 DAGs showing different ways to interact with DuckDB and MotherDuck from within Airflow:

  • duckdb_in_taskflow: This DAG uses the duckdb Python package directly to connect. Note that some tasks will fail if no MotherDuck token was provided.
  • duckdb_provider_example: This DAG uses the DuckDBHook from the DuckDB Airflow provider to connect to DuckDB and MotherDuck.
  • duckdb_custom_operator_example: This DAG uses the custom local operator ExcelToDuckDBOperator which is stored in include/duckdb_operator.py to load the contents of an Excel file (include/ducks_in_the_pond) into a DuckDB or MotherDuck database.
  • duckdb_and_astro_sdk_example: This DAG uses the Astro SDK to connect to perform a simple ELT pipeline with local DuckDB. Note that Astro SDK support for MotherDuck is coming soon. :)

See also

About

A repository for examples on using Airflow with DuckDB.


Languages

Language:Python 99.6%Language:Dockerfile 0.4%