ClickHouse / ClickHouse

ClickHouse® is a real-time analytics DBMS

Home Page:https://clickhouse.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A list of benchmarks

alexey-milovidov opened this issue · comments

Post every link to benchmark here.

Brown university: https://github.com/crottyan/mgbench
Abandoned, but still quite fresh.

h2o.ai dataframe like benchmark: https://h2oai.github.io/db-benchmark/

AMPLab benchmark: https://amplab.cs.berkeley.edu/benchmark/
Outdated and abandoned, does not include ClickHouse.

TSBS benchmark: https://altinity.com/blog/clickhouse-for-time-series
(2018 version, does not have read in order optimization)

ClickHouse vs Kudu on TSBS (from Cloudera, in favor of Kudu):
https://blog.cloudera.com/benchmarking-time-series-workloads-on-apache-kudu-using-tsbs/

ClickHouse vs. MariaDB ColumnStore on Star Schema Benchmark (TPC-H derivative):
https://www.percona.com/blog/2020/07/27/clickhouse-and-columnstore-in-the-star-schema-benchmark/

ClickHouse vs. Redshift (price-performance), no many details:
https://altinity.com/blog/2017/7/3/clickhouse-vs-redshift-2

Mark Litwintschik's benchmarks: different systems on different hardware at different time:
https://tech.marksblogg.com/benchmarks.html

SparkSql, Presto, Impala, HAWQ, ClickHouse, GreenPlum on TPC-DS inspired benchmark:
https://programmersought.com/article/7939525005/

ClickHouse vs. kdb+ at Deutsche Bank:
https://youtu.be/0lE-a3TIC5M?t=1036

ClickHouse vs. Redshift on time series data:
http://brandonharris.io/redshift-clickhouse-time-series/

Tensorbase (an experimental prototype) brief benchmark:
https://tensorbase.io/2021/04/20/base_reload.html

ClickHouse vs TimescaleDB vs InfluxDB vs. QuasarDB:
https://blog.quasardb.net/benchmarking-timeseries-ingress

ClickHouse, OpenTSDB, Cassandra, MySQL, InfluxDB and TDEngine on time-series scenario:
https://www.taosdata.com/downloads/TDengine_Testing_Report_cn.pdf

TimescaleDB vs. ClickHouse (use case in real product):
https://splitbee.io/blog/new-pricing

ClickHouse vs DuckDB, HyPer and SQLite in sorting:
https://duckdb.org/2021/08/27/external-sorting.html

All systems are slower than ClickHouse and DuckDB.
ClickHouse is faster than DuckDB in multi-column sorting, no difference with single column sorting and slower on external sorting.

ClickHouse vs Redshift vs BigQuery vs Athena vs Presto vs Spark:

A benchmark from Hydrolix website:

Screenshot_20211031_230331

Note: Hydrolix itself is a breed of ClickHouse.

Update: they compare results on different hardware. When I run ClickHouse on the same Hardware, it get 0.641 seconds (even if using slightly large dataset of 1.3 billion records instead of 1.1 billion).

Timescale vs ClickHouse (independent benchmark):
https://pradeepchhetri.xyz/clickhousevstimescaledb/

I've done several tests on ClickHouse release from v22.2.3.5-stable ~ v21.8.x on several machines(with different architects(ARM64/AMD64)).
Screenshot from 2022-04-20 09-59-23

Test results can be seen on: https://clickperf.knat.network/

I've also written a post on how I've built Multi-Arch docker images for those versions and the procedure of theses tests, blog post is on: https://nova.moe/performance-comparison-clickhouse/ (Currently only Chinese version).

Benchmarks from Manticore search: https://github.com/db-benchmarks/db-benchmarks

Caveat: the benchmarks are tuned in favor of their system:

  • most of the queries are doing full text search or utilizing secondary indices;
  • the table schema for ClickHouse made suboptimal (most of the fields as String).

ClickHouse vs OctoSQL, SPyQL, jq, trdsql, spark-sql, DSQ on JSON processing:
https://github.com/dcmoura/spyql/blob/master/notebooks/json_benchmark.ipynb

Mark Litwintschik's benchmarks: different systems on different hardware at different time: https://tech.marksblogg.com/benchmarks.html

I noticed that in his benchmarks kdb is performing better than Clickhouse on a CPU.
Plus kdb seems to be popular in HFT circles.
Is it expected for kdb to be faster in benchmarks?
Does Clickhouse use less RAM than kdb?
@alexey-milovidov

0.051 0.146 0.047 0.794 kdb+/q & 4 Intel Xeon Phi 7210 CPUs

chip comes with 64 cores, each with 4 threads. (256 threads per CPU x 4 servers = 1024 threads (but it's special intel Phi processors))

0.241 0.826 1.209 1.781 ClickHouse, 3 x c5d.9xlarge cluster

three c5d.9xlarge EC2 instances for this post. They each contain 36 vCPUs, 72 GB of RAM, (36*3 = 108 threads)

Plus ClickHouse benchmark is like 3 years old. (For ClickHouse it's a LOT)
KDB benchmark is 5 years old, but KDB itself much older and mostly stabilized.

@fshabashev comparison on similar hardware shows that ClickHouse is faster than kdb+ (for banking applications): #22398 (comment)

I don't know how fair this is. I would be curious, particularly if Clickhouse after the recent / impending join optimizations is faster at multitable TPCH. This is my biggest pain point now and even the recent parallel_hash feature has been helpful

https://starrocks.com/blog/clickhouse_or_starrocks

@alanpaulkwan ClickHouse is bad on TPC-H.

@alexey-milovidov how clickhouse vs doris on TPC-H.

ClickHouse vs. InfluxDB vs. TimescaleDB on time series workloads from the particle physics domain (ingestion and querying):

https://arxiv.org/pdf/2204.09795.pdf "SciTS: A Benchmark for Time-Series Databases in Scientific Experiments and Industrial Internet of Things" (2022) @jalalmostafa

ClickHouse wins in all experiments except for ingestion speed with small batches (< 10k tuples).

ClickHouse vs. InfluxDB vs. TimescaleDB on time series workloads from the particle physics domain (ingestion and querying):

https://arxiv.org/pdf/2204.09795.pdf "SciTS: A Benchmark for Time-Series Databases in Scientific Experiments and Industrial Internet of Things" (2022) @jalalmostafa

ClickHouse wins in all experiments except for ingestion speed with small batches (< 10k tuples).

@rschu1ze Thanks for referencing!

The code of the benchmark is available here:

https://github.com/jalalmostafa/SciTS

@yingxuanwangxuan I did not see any benchmark of ClickHouse vs Apache Doris, while I know that Doris is using many algorithms from ClickHouse and it should demonstrate good performance.

We should add Apache Doris to ClickBench:
https://github.com/ClickHouse/ClickBench

@jalalmostafa My friend @valyala said that it makes sense to also add VictoriaMetrics for one of the next researches.
He says that it should fit perfectly.

@jalalmostafa My friend @valyala said that it makes sense to also add VictoriaMetrics for one of the next researches. He says that it should fit perfectly.

Thank you! VictoriaMetrics is an interesting database. We might include it later in our benchmarks!

@alexey-milovidov how clickhouse vs doris on TPC-H.

ClickHouse vs InfluxDB vs MySQL: https://itnext.io/experience-sharing-clickhouse-performance-testing-9b913aa0daff

Ops, The author deleted this Medium story.

@linghengqian I want Doris to be included in the ClickBench, it is in the list under the Doris/PALO name (should rename it to Apache Doris probably).

ClickHouse vs ElasticSearch:

MASTER’S THESIS 2022 "Evaluating ClickHouse as a Big Data Processing Solution for IoT-Telemetry" - Adrian Göransson, Oskar Wändesjö: https://lup.lub.lu.se/luur/download?func=downloadFile&recordOId=9078076&fileOId=9078077

Comparing performance with the baseline of storing raw files in Minio.

SigNoz (an observability platform based on ClickHouse) vs. ELK and Loki:

https://signoz.io/blog/logs-performance-benchmark/

commented

Great list. Heavily based on this page I'm trying to gather a meta benchmarks page for all databases:
https://www.timestored.com/time-series-data/time-series-database-benchmarks
If you have any feedback, please let me know.

When I started adding red marks for vendor chosen benchmarks almost everything turned red and most them are very closed with no source code or repeatability. Well done on being very open with clickhouse benchmarks.

@ryanhamilton A correction to the article:

Clickbench is a large suite of benchmarks produced by clickhouse themselves. The focus is not time-series but a wide range of queries. They are being very transparent and open, i.e. on some queries they are beaten but the benchmarks only include open source choices.

ClickBench includes both open-source and proprietary DBMS, both self-managed and Cloud. For self-managed proprietary DBMS we include Kinetica and SingleStore. For Cloud proprietary DBMS we include Snowflake, Redshift, Athena, Aurora, SingleStore, Bytehouse.

ClickHouse vs. DuckDB on Parquet files: duckdb/duckdb#6478
ClickHouse vs. DuckDB on sorting in external memory: https://duckdb.org/2021/08/27/external-sorting.html

ClickHouse vs. RedShift on analysing Ethereum blockchain: https://clickhouse.com/blog/redshift-vs-clickhouse-comparison

ClickHouse vs. DuckDB on local data analysis: https://www.vantage.sh/blog/clickhouse-local-vs-duckdb

ClickHouse, ByConity, Doris, and Presto on TPC-DS: https://www.infoq.cn/article/SQCArsXNtZ9N1vEbLBqx

ClickHouse, TimescaleDB, QuestDB, InfluxDB, eXtremeDB:
https://www.mcobject.com/wp-content/uploads/dlm_uploads/2023/08/TSM-Bench-Benchmarking-Time-Series-Database-Systems.pdf

@alexey-milovidov This benchmark is based on the result from TSM-Bench repository, they used InfluxDB 1.7 to make benchmarks against Clickhouse. There are 3 mainstream versions of InfluxDB:

  • InfluxDB 1 - The latest version is 1.8, and it is written in Go
  • InfluxDB 2 - The rewritten version of InfluxDB 1, and it is written in Rust. Though it is open source, the official helm chart is somehow using InfluxDB 1.8 as of now (December 2023)
  • InfluxDB 3 - This one is closed source as of now (December 2023). They also claimed InfluxDB 3 is 45x faster than InfluxDB 1.8.