junwen12221 / datafuse

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

Home Page:https://datafuse.rs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Datafuse

Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture

Built to make the Data Cloud easy!



Stargazers over time

Principles

  • Fearless

    • No data races, No unsafe, Minimize unhandled errors
  • High Performance

    • Everything is Parallelism
  • High Scalability

    • Everything is Distributed
  • High Reliability

    • Datafuse primary design goal is reliability

Architecture

Datafuse Architecture

Performance

  • Memory SIMD-Vector processing performance only
  • Dataset: 100,000,000,000 (100 Billion)
  • Hardware: AMD Ryzen 7 PRO 4750U, 8 CPU Cores, 16 Threads
  • Rust: rustc 1.55.0-nightly (868c702d0 2021-06-30)
  • Build with Link-time Optimization and Using CPU Specific Instructions
  • ClickHouse server version 21.4.6 revision 54447
Query FuseQuery (v0.4.48-nightly) ClickHouse (v21.4.6)
SELECT avg(number) FROM numbers_mt(100000000000) 4.35 s.
(22.97 billion rows/s., 183.91 GB/s.)
×1.4 slow, (6.04 s.)
(16.57 billion rows/s., 132.52 GB/s.)
SELECT sum(number) FROM numbers_mt(100000000000) 4.20 s.
(23.79 billion rows/s., 190.50 GB/s.)
×1.4 slow, (5.90 s.)
(16.95 billion rows/s., 135.62 GB/s.)
SELECT min(number) FROM numbers_mt(100000000000) 4.92 s.
(20.31 billion rows/s., 162.64 GB/s.)
×2.7 slow, (13.05 s.)
(7.66 billion rows/s., 61.26 GB/s.)
SELECT max(number) FROM numbers_mt(100000000000) 4.77 s.
(20.95 billion rows/s., 167.78 GB/s.)
×3.0 slow, (14.07 s.)
(7.11 billion rows/s., 56.86 GB/s.)
SELECT count(number) FROM numbers_mt(100000000000) 2.91 s.
(34.33 billion rows/s., 274.90 GB/s.)
×1.3 slow, (3.71 s.)
(26.93 billion rows/s., 215.43 GB/s.)
SELECT sum(number+number+number) FROM numbers_mt(100000000000) 19.83 s.
(5.04 billion rows/s., 40.37 GB/s.)
×12.1 slow, (233.71 s.)
(427.87 million rows/s., 3.42 GB/s.)
SELECT sum(number) / count(number) FROM numbers_mt(100000000000) 3.90 s.
(25.62 billion rows/s., 205.13 GB/s.)
×2.5 slow, (9.70 s.)
(10.31 billion rows/s., 82.52 GB/s.)
SELECT sum(number) / count(number), max(number), min(number) FROM numbers_mt(100000000000) 8.28 s.
(12.07 billion rows/s., 96.66 GB/s.)
×4.0 slow, (32.87 s.)
(3.04 billion rows/s., 24.34 GB/s.)
SELECT number FROM numbers_mt(10000000000) ORDER BY number DESC LIMIT 100 4.80 s.
(2.08 billion rows/s., 16.67 GB/s.)
×2.9 slow, (13.95 s.)
(716.62 million rows/s., 5.73 GB/s.)
SELECT max(number), sum(number) FROM numbers_mt(1000000000) GROUP BY number % 3, number % 4, number % 5 6.31 s.
(158.49 million rows/s., 1.27 GB/s.)
×1.02 fast, (6.18 s.)
(161.84 million rows/s., 1.29 GB/s.)

Note:

  • ClickHouse system.numbers_mt is 16-way parallelism processing, gist
  • FuseQuery system.numbers_mt is 16-way parallelism processing, gist

Getting Started

Roadmap

Datafuse is currently in Alpha and is not ready to be used in production, Roadmap 2021

Contributing

License

Datafuse is licensed under Apache 2.0.

About

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, built to make the Data Cloud easy

https://datafuse.rs

License:Apache License 2.0


Languages

Language:Rust 72.7%Language:TypeScript 11.8%Language:SCSS 7.3%Language:HTML 6.1%Language:Shell 1.2%Language:Python 0.7%Language:Makefile 0.1%Language:Dockerfile 0.1%Language:Smarty 0.1%