hntd187 / datafusion

SQL Query Execution in Rust

Home Page:https://datafusion.rs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DataFusion: SQL Query Execution in Rust

License Version Build Status Coverage Status Gitter chat

DataFusion is a SQL parser, planner, and query execution library for Rust. A DataFrame API is also provided.

The following features are currently supported:

  • SQL Parser, Planner and Optimizer
  • DataFrame API
  • Columnar processing using Apache Arrow
  • Support for local CSV and Apache Parquet files
  • Single-threaded execution of SQL queries, supporting:
    • Projection
    • Selection
    • Scalar Functions
    • Aggregates (Min, Max, Count)
    • Grouping
  • User-defined Scalar Functions (UDFs)

DataFusion can be used as a crate dependency in your project to add SQL support for custom data sources.

A Docker image is also available if you just want to run SQL queries against your CSV and Parquet files.

I have plans to make DataFusion a fully distributed compute platform with features similar to Apache Spark, but I need help from contributors to get there.

Project Home Page

The project home page is now at https://datafusion.rs and contains the roadmap as well as documentation for using this crate. I am using GitHub issues to track development tasks and feedback.

Prerequisites

  • Rust nightly (required by parquet-rs crate)

Building DataFusion

See BUILDING.md.

Gitter

There is a Gitter channel where you can ask questions about the project or make feature suggestions too.

Contributing

Contributors are welcome! Please see CONTRIBUTING.md for details.

About

SQL Query Execution in Rust

https://datafusion.rs/

License:Apache License 2.0


Languages

Language:Rust 98.2%Language:Shell 1.1%Language:HTML 0.3%Language:RenderScript 0.2%Language:CSS 0.1%