There are 3 repositories under datafusion topic.
Apache DataFusion SQL Query Engine
the portable Python dataframe library
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Apache Arrow DataFusion Comet Spark Accelerator
Analytical database for data-driven Web applications 🪶
Query and transform data with PRQL
A Python library to run analytics workloads with the performance of Rust, the flexibility of Python and O(1) cost in moving data between the two. Uses Apache Arrow in-memory format and respective query engine DataFusion.
etl engine 轻量级 跨平台 流批一体ETL引擎 数据抽取-转换-装载 ETL engine lightweight cross platform batch flow integration ETL engine data extraction transformation loading
Rust implementation of Apache Iceberg with integration for Datafusion
Java binding to Apache Arrow DataFusion
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
Scale to zero Seafowl hosting with Cloud Run
Community InfluxDB 3.0 "IOx" static builds + containers + Examples for Developers & Integrators. Experiment with low-cost storage, unlimited cardinality and FlightSQL APIs
Experimental Elixir bindings for Apache Arrow including Parquet and DataFusion
Awesome list of alternative dataframe libraries in Python.
Public repository of our IGARSS 2023 submission
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
ModelarDB: Model-Based Time Series Management from Edge to Cloud
proof-of-concept: compile datafusion to `wasm32-wasi` (run in `wasmedge`) and `wasm32-unknown-unknown` (run in browser)
Bigtable data source for Apache Arrow Datafusion
Apache Datafusion JVM User Defined Functions (UDF), integration nobody asked for 😀
Torchfusion is a very opinionated torch inference on datafusion.
A file explorer for data warehouses