DataFusion is a modern distributed compute platform implemented in Rust. It is very much inspired by Apache Spark and has a similar programming style through the use of DataFrames and SQL.
DataFusion can also be used as a crate dependency in your project if you want the ability to perform SQL queries and DataFrame style data manipulation in-process against your own data sources. In that respect, DataFusion is inspired by Apache Calcite in the Java world.
The project home page is now at https://datafusion.rs and contains the roadmap as well as documentation for using this crate or running DataFusion as a distributed cluster. I am using GitHub issues to track development tasks and feedback.
- Rust nightly
- Thrift (required by
parquet-rs
crate) - instructions here
See BUILDING.md.
There is a Gitter channel where you can ask questions about the project or make feature suggestions too.
Contributors are welcome! Please see CONTRIBUTING.md for details.