Airflow DuckDB examples
Welcome 👋 !
This is a small toy repository for you to test different ways of connection Airflow, DuckDB and MotherDuck.
If you are new to Airflow consider checking out our quickstart repository and Get started tutorial.
How to use this repository
-
Make sure you have Docker Desktop installed and running.
-
Install the Astro CLI.
-
Clone this repository.
-
Create a
.env
file with the contents of the provided.env.example
file. If you are using MotherDuck, provide your MotherDuck token. -
Start Airflow by running
astro dev start
. -
In the Airflow UI define the following Airflow connections:
my_local_duckdb_conn
with the following parameters:- Conn Type:
duckdb
- File (leave blank for in-memory database):
include/my_local_ducks
- Conn Type:
my_motherduck_conn
with the following parameters:- Conn Type:
duckdb
- File (leave blank for in-memory database):
- Conn Type:
You can double check your connection credentials using the
include/test_script.py
script. To run the script inside of the Airflow scheduler container runastro dev bash -s
and thenpython include/test_script.py
. -
Manually trigger DAGs by clicking the play button for each DAG on the right side of the screen.
DAGs
This repo contains 4 DAGs showing different ways to interact with DuckDB and MotherDuck from within Airflow:
duckdb_in_taskflow
: This DAG uses theduckdb
Python package directly to connect. Note that some tasks will fail if no MotherDuck token was provided.duckdb_provider_example
: This DAG uses the DuckDBHook from the DuckDB Airflow provider to connect to DuckDB and MotherDuck.duckdb_custom_operator_example
: This DAG uses the custom local operatorExcelToDuckDBOperator
which is stored ininclude/duckdb_operator.py
to load the contents of an Excel file (include/ducks_in_the_pond
) into a DuckDB or MotherDuck database.duckdb_and_astro_sdk_example
: This DAG uses the Astro SDK to connect to perform a simple ELT pipeline with local DuckDB. Note that Astro SDK support for MotherDuck is coming soon. :)