A list of benchmarks

Question

A list of benchmarks

alexey-milovidov opened this issue 3 years ago · comments

Alexey Milovidov commented 3 years ago

Post every link to benchmark here.

Alexey Milovidov · Answer 1 · Wed Mar 31 2021 12:49:33 GMT+0800 (China Standard Time)

ClickHouse vs Elasticsearch: https://my.oschina.net/taogang/blog/4965018

Alexey Milovidov · Answer 2 · Wed Mar 31 2021 12:50:17 GMT+0800 (China Standard Time)

Brown university: https://github.com/crottyan/mgbench
Abandoned, but still quite fresh.

Alexey Milovidov · Answer 3 · Wed Mar 31 2021 12:51:11 GMT+0800 (China Standard Time)

CERN ClickHouse vs InfluxDB: http://cds.cern.ch/record/2667383/

Alexey Milovidov · Answer 4 · Thu Apr 01 2021 22:28:07 GMT+0800 (China Standard Time)

https://fivetran.com/blog/warehouse-benchmark (without ClickHouse)

Alexey Milovidov · Answer 5 · Thu Apr 01 2021 22:28:50 GMT+0800 (China Standard Time)

h2o.ai dataframe like benchmark: https://h2oai.github.io/db-benchmark/

Alexey Milovidov · Answer 6 · Thu Apr 01 2021 22:30:09 GMT+0800 (China Standard Time)

AMPLab benchmark: https://amplab.cs.berkeley.edu/benchmark/
Outdated and abandoned, does not include ClickHouse.

Alexey Milovidov · Answer 7 · Thu Apr 01 2021 22:33:16 GMT+0800 (China Standard Time)

TSBS benchmark: https://altinity.com/blog/clickhouse-for-time-series
(2018 version, does not have read in order optimization)

ClickHouse vs Kudu on TSBS (from Cloudera, in favor of Kudu):
https://blog.cloudera.com/benchmarking-time-series-workloads-on-apache-kudu-using-tsbs/

Alexey Milovidov · Answer 8 · Thu Apr 01 2021 22:35:09 GMT+0800 (China Standard Time)

ClickHouse vs Spark on Wikipedia data:
https://www.percona.com/blog/2017/02/13/clickhouse-new-opensource-columnar-database/

Alexey Milovidov · Answer 9 · Thu Apr 01 2021 22:35:47 GMT+0800 (China Standard Time)

ClickHouse vs. MariaDB ColumnStore on Star Schema Benchmark (TPC-H derivative):
https://www.percona.com/blog/2020/07/27/clickhouse-and-columnstore-in-the-star-schema-benchmark/

Alexey Milovidov · Answer 10 · Thu Apr 01 2021 22:37:12 GMT+0800 (China Standard Time)

ClickHouse vs. Redshift on NYC taxi dataset:
https://altinitydb.medium.com/clickhouse-vs-amazon-redshift-benchmark-e223429f4f95

Alexey Milovidov · Answer 11 · Thu Apr 01 2021 22:41:28 GMT+0800 (China Standard Time)

ClickHouse vs. Redshift (price-performance), no many details:
https://altinity.com/blog/2017/7/3/clickhouse-vs-redshift-2

Alexey Milovidov · Answer 12 · Thu Apr 01 2021 22:44:07 GMT+0800 (China Standard Time)

PostgreSQL vs ClickHouse via clickhouse_fdw in PostgreSQL:
https://www.percona.com/blog/2019/05/09/improving-olap-workload-performance-for-postgresql-with-clickhouse-database/

Alexey Milovidov · Answer 13 · Thu Apr 01 2021 22:44:58 GMT+0800 (China Standard Time)

ClickHouse vs. MariaDB Column Store vs. Spark vs. MySQL on Wikipedia data:
https://www.percona.com/blog/2017/03/17/column-store-database-benchmarks-mariadb-columnstore-vs-clickhouse-vs-apache-spark/

Alexey Milovidov · Answer 14 · Thu Apr 01 2021 22:45:34 GMT+0800 (China Standard Time)

ClickHouse vs. clickhouse_fdw in PostgreSQL:
https://www.percona.com/blog/2019/05/01/benchmark-clickhouse-database-and-clickhousedb_fdw/

Alexey Milovidov · Answer 15 · Thu Apr 01 2021 22:48:44 GMT+0800 (China Standard Time)

Mark Litwintschik's benchmarks: different systems on different hardware at different time:
https://tech.marksblogg.com/benchmarks.html

Alexey Milovidov · Answer 16 · Thu Apr 01 2021 22:50:17 GMT+0800 (China Standard Time)

SparkSql, Presto, Impala, HAWQ, ClickHouse, GreenPlum on TPC-DS inspired benchmark:
https://programmersought.com/article/7939525005/

Alexey Milovidov · Answer 17 · Thu Apr 01 2021 22:52:47 GMT+0800 (China Standard Time)

ClickHouse vs. kdb+ at Deutsche Bank:
https://youtu.be/0lE-a3TIC5M?t=1036

alex-zaitsev · Answer 18 · Thu Apr 01 2021 23:37:45 GMT+0800 (China Standard Time)

ClickHouse vs. Redshift on NYC taxi dataset:
https://altinitydb.medium.com/clickhouse-vs-amazon-redshift-benchmark-e223429f4f95

Superceeded by https://altinity.com/blog/clickhouse-and-redshift-face-off-again-in-nyc-taxi-rides-benchmark

alex-zaitsev · Answer 19 · Thu Apr 01 2021 23:38:15 GMT+0800 (China Standard Time)

ClickHouse vs RedShift on fintech use case:
https://altinity.com/blog/clickhouse-vs-redshift-performance-for-fintech-risk-management

alex-zaitsev · Answer 20 · Thu Apr 01 2021 23:38:36 GMT+0800 (China Standard Time)

ClickHouse vs Druid vs Rockset:
https://altinity.com/blog/clickhouse-nails-cost-efficiency-challenge-against-druid-rockset

alex-zaitsev · Answer 21 · Thu Apr 01 2021 23:40:27 GMT+0800 (China Standard Time)

ClickHouse vs ScyllaDB on 500B rows dataset:
https://altinity.com/blog/2020/1/1/clickhouse-cost-efficiency-in-action-analyzing-500-billion-rows-on-an-intel-nuc

Alexey Milovidov · Answer 22 · Fri Apr 30 2021 20:10:54 GMT+0800 (China Standard Time)

ClickHouse vs. Redshift on time series data:
http://brandonharris.io/redshift-clickhouse-time-series/

Alexey Milovidov · Answer 23 · Sun May 02 2021 19:46:25 GMT+0800 (China Standard Time)

Tensorbase (an experimental prototype) brief benchmark:
https://tensorbase.io/2021/04/20/base_reload.html

Kimmo Linna · Answer 24 · Wed May 05 2021 01:57:16 GMT+0800 (China Standard Time)

Clickhouse vs. InfluxDB vs. Timescale vs. OpenTSDB
https://www.sciencedirect.com/science/article/pii/S1877050919310439/pdf?md5=ecc390cfc9d0be432ca0c218985c94d5&pid=1-s2.0-S1877050919310439-main.pdf

Alexey Milovidov · Answer 25 · Wed Jun 09 2021 08:11:33 GMT+0800 (China Standard Time)

ClickHouse vs TimescaleDB vs InfluxDB vs. QuasarDB:
https://blog.quasardb.net/benchmarking-timeseries-ingress

Alexey Milovidov · Answer 26 · Wed Jun 09 2021 08:12:49 GMT+0800 (China Standard Time)

ClickHouse vs. TimescaleDB:
https://twitter.com/ThisIsFernandez/status/1402369109594148865

Alexey Milovidov · Answer 27 · Sat Jun 12 2021 09:33:49 GMT+0800 (China Standard Time)

ClickHouse, OpenTSDB, Cassandra, MySQL, InfluxDB and TDEngine on time-series scenario:
https://www.taosdata.com/downloads/TDengine_Testing_Report_cn.pdf

Alexey Milovidov · Answer 28 · Wed Aug 04 2021 06:26:48 GMT+0800 (China Standard Time)

TimescaleDB vs. ClickHouse (use case in real product):
https://splitbee.io/blog/new-pricing

Alexey Milovidov · Answer 29 · Wed Sep 01 2021 10:23:37 GMT+0800 (China Standard Time)

ClickHouse vs DuckDB, HyPer and SQLite in sorting:
https://duckdb.org/2021/08/27/external-sorting.html

All systems are slower than ClickHouse and DuckDB.
ClickHouse is faster than DuckDB in multi-column sorting, no difference with single column sorting and slower on external sorting.

Alexey Milovidov · Answer 30 · Fri Sep 10 2021 22:58:17 GMT+0800 (China Standard Time)

TimescaleDB vs ClickHouse by GitLab:
https://gitlab.com/gitlab-org/incubation-engineering/apm/apm/-/issues/
https://www.youtube.com/watch?v=cMdQsxolcqc

Alexey Milovidov · Answer 31 · Sat Sep 18 2021 20:40:32 GMT+0800 (China Standard Time)

QuestDB vs ClickHouse on time series ingestion performance:
https://questdb.io/blog/2021/06/16/high-cardinality-time-series-data-performance/

Alexey Milovidov · Answer 32 · Fri Oct 22 2021 02:36:16 GMT+0800 (China Standard Time)

Timescale vs ClickHouse (from their blog):
https://blog.timescale.com/blog/what-is-clickhouse-how-does-it-compare-to-postgresql-and-timescaledb-and-how-does-it-perform-for-time-series-data/

Alexey Milovidov · Answer 33 · Mon Nov 01 2021 04:04:44 GMT+0800 (China Standard Time)

ClickHouse vs Redshift vs BigQuery vs Athena vs Presto vs Spark:

A benchmark from Hydrolix website:

Note: Hydrolix itself is a breed of ClickHouse.

Update: they compare results on different hardware. When I run ClickHouse on the same Hardware, it get 0.641 seconds (even if using slightly large dataset of 1.3 billion records instead of 1.1 billion).

Alexey Milovidov · Answer 34 · Tue Nov 02 2021 08:20:36 GMT+0800 (China Standard Time)

Timescale vs ClickHouse (independent benchmark):
https://pradeepchhetri.xyz/clickhousevstimescaledb/

inv2004 · Answer 35 · Thu Jan 20 2022 06:45:54 GMT+0800 (China Standard Time)

https://github.com/inv2004/100m_taxi_bench

Alexey Milovidov · Answer 36 · Sat Jan 22 2022 22:23:42 GMT+0800 (China Standard Time)

Newer ClickHouse vs. Timescale benchmark from GitLab:
https://gitlab.com/gitlab-org/incubation-engineering/apm/apm/-/issues/4

Alexey Milovidov · Answer 37 · Sat Jan 22 2022 22:28:46 GMT+0800 (China Standard Time)

ClickHouse vs MongoDB from GitLab:
https://gitlab.com/gitlab-org/incubation-engineering/apm/apm/-/issues/12

Alexey Milovidov · Answer 38 · Fri Mar 18 2022 22:06:08 GMT+0800 (China Standard Time)

ClickHouse vs ElasticSearch from Alibaba:
https://www.alibabacloud.com/blog/clickhouse-vs--elasticsearch_597898?spm=a2c65.11461447.0.0.507e2f28FHNoxt

Alexey Milovidov · Answer 39 · Thu Apr 14 2022 21:14:17 GMT+0800 (China Standard Time)

ClickHouse vs SingleStore:
https://data-sleek.com/blog/singlestore-vs-clickhouse-benchmarks/

(needs validation)

Alexey Milovidov · Answer 40 · Thu Apr 14 2022 21:33:15 GMT+0800 (China Standard Time)

ClickHouse vs SingleStore on data loading:
https://altinity.com/blog/loading-100b-rows-in-minutes-in-altinity-cloud

Nova Kwok · Answer 41 · Wed Apr 20 2022 09:59:49 GMT+0800 (China Standard Time)

I've done several tests on ClickHouse release from v22.2.3.5-stable ~ v21.8.x on several machines(with different architects(ARM64/AMD64)).

Test results can be seen on: https://clickperf.knat.network/

I've also written a post on how I've built Multi-Arch docker images for those versions and the procedure of theses tests, blog post is on: https://nova.moe/performance-comparison-clickhouse/ (Currently only Chinese version).

Alexey Milovidov · Answer 42 · Sat May 28 2022 10:26:35 GMT+0800 (China Standard Time)

Benchmarks from Manticore search: https://github.com/db-benchmarks/db-benchmarks

Caveat: the benchmarks are tuned in favor of their system:

most of the queries are doing full text search or utilizing secondary indices;
the table schema for ClickHouse made suboptimal (most of the fields as String).

Alexey Milovidov · Answer 43 · Sat May 28 2022 11:06:07 GMT+0800 (China Standard Time)

ClickHouse vs OctoSQL, SPyQL, jq, trdsql, spark-sql, DSQ on JSON processing:
https://github.com/dcmoura/spyql/blob/master/notebooks/json_benchmark.ipynb

Alexey Milovidov · Answer 44 · Fri Jun 03 2022 09:16:11 GMT+0800 (China Standard Time)

A benchmark from QuestDB shows obnoxious results:
https://questdb.io/blog/2022/05/26/query-benchmark-questdb-versus-clickhouse-timescale/

Fedor Shabashev · Answer 45 · Thu Jun 16 2022 23:40:18 GMT+0800 (China Standard Time)

Mark Litwintschik's benchmarks: different systems on different hardware at different time: https://tech.marksblogg.com/benchmarks.html

I noticed that in his benchmarks kdb is performing better than Clickhouse on a CPU.
Plus kdb seems to be popular in HFT circles.
Is it expected for kdb to be faster in benchmarks?
Does Clickhouse use less RAM than kdb?
@alexey-milovidov

UnamedRus · Answer 46 · Thu Jun 16 2022 23:47:28 GMT+0800 (China Standard Time)

0.051 0.146 0.047 0.794 kdb+/q & 4 Intel Xeon Phi 7210 CPUs

chip comes with 64 cores, each with 4 threads. (256 threads per CPU x 4 servers = 1024 threads (but it's special intel Phi processors))

0.241 0.826 1.209 1.781 ClickHouse, 3 x c5d.9xlarge cluster

three c5d.9xlarge EC2 instances for this post. They each contain 36 vCPUs, 72 GB of RAM, (36*3 = 108 threads)

Plus ClickHouse benchmark is like 3 years old. (For ClickHouse it's a LOT)
KDB benchmark is 5 years old, but KDB itself much older and mostly stabilized.

Alexey Milovidov · Answer 47 · Sat Jun 18 2022 12:46:26 GMT+0800 (China Standard Time)

@fshabashev comparison on similar hardware shows that ClickHouse is faster than kdb+ (for banking applications): #22398 (comment)

alanpaulkwan · Answer 48 · Sat Jun 18 2022 14:59:39 GMT+0800 (China Standard Time)

I don't know how fair this is. I would be curious, particularly if Clickhouse after the recent / impending join optimizations is faster at multitable TPCH. This is my biggest pain point now and even the recent parallel_hash feature has been helpful

https://starrocks.com/blog/clickhouse_or_starrocks

Alexey Milovidov · Answer 49 · Sun Jun 19 2022 02:09:41 GMT+0800 (China Standard Time)

@alanpaulkwan ClickHouse is bad on TPC-H.

yingxuanwangxuan · Answer 50 · Thu Aug 04 2022 15:46:38 GMT+0800 (China Standard Time)

@alexey-milovidov how clickhouse vs doris on TPC-H.

Robert Schulze · Answer 51 · Mon Aug 15 2022 03:21:10 GMT+0800 (China Standard Time)

ClickHouse vs. InfluxDB vs. TimescaleDB on time series workloads from the particle physics domain (ingestion and querying):

https://arxiv.org/pdf/2204.09795.pdf "SciTS: A Benchmark for Time-Series Databases in Scientific Experiments and Industrial Internet of Things" (2022) @jalalmostafa

ClickHouse wins in all experiments except for ingestion speed with small batches (< 10k tuples).

Jalal Mostafa · Answer 52 · Mon Aug 15 2022 04:24:42 GMT+0800 (China Standard Time)

ClickHouse vs. InfluxDB vs. TimescaleDB on time series workloads from the particle physics domain (ingestion and querying):

https://arxiv.org/pdf/2204.09795.pdf "SciTS: A Benchmark for Time-Series Databases in Scientific Experiments and Industrial Internet of Things" (2022) @jalalmostafa

ClickHouse wins in all experiments except for ingestion speed with small batches (< 10k tuples).

@rschu1ze Thanks for referencing!

The code of the benchmark is available here:

https://github.com/jalalmostafa/SciTS

Alexey Milovidov · Answer 53 · Mon Aug 15 2022 13:09:14 GMT+0800 (China Standard Time)

@jalalmostafa Thank you! Also added here: ClickHouse/ClickBench#20

Alexey Milovidov · Answer 54 · Mon Aug 15 2022 13:10:16 GMT+0800 (China Standard Time)

@yingxuanwangxuan I did not see any benchmark of ClickHouse vs Apache Doris, while I know that Doris is using many algorithms from ClickHouse and it should demonstrate good performance.

Alexey Milovidov · Answer 55 · Mon Aug 15 2022 13:10:36 GMT+0800 (China Standard Time)

We should add Apache Doris to ClickBench:
https://github.com/ClickHouse/ClickBench

Alexey Milovidov · Answer 56 · Mon Aug 15 2022 16:36:16 GMT+0800 (China Standard Time)

@jalalmostafa My friend @valyala said that it makes sense to also add VictoriaMetrics for one of the next researches.
He says that it should fit perfectly.

Jalal Mostafa · Answer 57 · Mon Aug 15 2022 18:30:50 GMT+0800 (China Standard Time)

@jalalmostafa My friend @valyala said that it makes sense to also add VictoriaMetrics for one of the next researches. He says that it should fit perfectly.

Thank you! VictoriaMetrics is an interesting database. We might include it later in our benchmarks!

Alexey Milovidov · Answer 58 · Fri Aug 26 2022 22:06:03 GMT+0800 (China Standard Time)

ClickHouse vs InfluxDB vs MySQL: https://itnext.io/experience-sharing-clickhouse-performance-testing-9b913aa0daff

Ling Hengqian · Answer 59 · Tue Sep 06 2022 11:18:29 GMT+0800 (China Standard Time)

@alexey-milovidov how clickhouse vs doris on TPC-H.

@yingxuanwangxuan
I don't see a target for this database at https://github.com/ClickHouse/ClickBench#systems-included . Is this because Apache Doris itself has tested it at https://doris.apache.org/docs/benchmark ? But it tests on a completely different machine than ClickBench uses. I think there should be a need to open a new issue tracker in ClickBench?

Ramazan Polat · Answer 60 · Tue Sep 06 2022 18:11:30 GMT+0800 (China Standard Time)

ClickHouse vs InfluxDB vs MySQL: https://itnext.io/experience-sharing-clickhouse-performance-testing-9b913aa0daff

Ops, The author deleted this Medium story.

Alexey Milovidov · Answer 61 · Fri Sep 09 2022 11:06:06 GMT+0800 (China Standard Time)

@linghengqian I want Doris to be included in the ClickBench, it is in the list under the Doris/PALO name (should rename it to Apache Doris probably).

Alexey Milovidov · Answer 62 · Fri Oct 21 2022 11:24:50 GMT+0800 (China Standard Time)

ClickHouse vs MariaDB Column Store:
https://medium.com/datadenys/mariadb-column-store-installation-and-quick-overview-9911435e4574

Alexey Milovidov · Answer 63 · Sun Jan 29 2023 05:54:25 GMT+0800 (China Standard Time)

ClickHouse vs ElasticSearch:

MASTER’S THESIS 2022 "Evaluating ClickHouse as a Big Data Processing Solution for IoT-Telemetry" - Adrian Göransson, Oskar Wändesjö: https://lup.lub.lu.se/luur/download?func=downloadFile&recordOId=9078076&fileOId=9078077

Comparing performance with the baseline of storing raw files in Minio.

Alexey Milovidov · Answer 64 · Sun Jan 29 2023 05:56:49 GMT+0800 (China Standard Time)

SigNoz (an observability platform based on ClickHouse) vs. ELK and Loki:

https://signoz.io/blog/logs-performance-benchmark/

Alexey Milovidov · Answer 65 · Sun Jan 29 2023 06:21:40 GMT+0800 (China Standard Time)

ClickHouse vs. Postgres:

https://aiven.io/blog/why-you-should-offload-your-pg-analytical-workloads-to-clickhouse

Ryan · Answer 66 · Wed Feb 15 2023 23:33:16 GMT+0800 (China Standard Time)

Great list. Heavily based on this page I'm trying to gather a meta benchmarks page for all databases:
https://www.timestored.com/time-series-data/time-series-database-benchmarks
If you have any feedback, please let me know.

When I started adding red marks for vendor chosen benchmarks almost everything turned red and most them are very closed with no source code or repeatability. Well done on being very open with clickhouse benchmarks.

Alexey Milovidov · Answer 67 · Sat Feb 18 2023 08:42:48 GMT+0800 (China Standard Time)

@ryanhamilton A correction to the article:

Clickbench is a large suite of benchmarks produced by clickhouse themselves. The focus is not time-series but a wide range of queries. They are being very transparent and open, i.e. on some queries they are beaten but the benchmarks only include open source choices.

ClickBench includes both open-source and proprietary DBMS, both self-managed and Cloud. For self-managed proprietary DBMS we include Kinetica and SingleStore. For Cloud proprietary DBMS we include Snowflake, Redshift, Athena, Aurora, SingleStore, Bytehouse.

Alexey Milovidov · Answer 68 · Wed Mar 08 2023 08:09:53 GMT+0800 (China Standard Time)

ClickHouse vs. DuckDB on Parquet files: duckdb/duckdb#6478
ClickHouse vs. DuckDB on sorting in external memory: https://duckdb.org/2021/08/27/external-sorting.html

Alexey Milovidov · Answer 69 · Tue Apr 11 2023 02:58:12 GMT+0800 (China Standard Time)

ClickHouse vs. RedShift on analysing Ethereum blockchain: https://clickhouse.com/blog/redshift-vs-clickhouse-comparison

Alexey Milovidov · Answer 70 · Mon May 29 2023 05:25:21 GMT+0800 (China Standard Time)

ClickHouse vs. DuckDB on local data analysis: https://www.vantage.sh/blog/clickhouse-local-vs-duckdb

Alexey Milovidov · Answer 71 · Wed Jun 07 2023 11:13:04 GMT+0800 (China Standard Time)

ClickHouse, ByConity, Doris, and Presto on TPC-DS: https://www.infoq.cn/article/SQCArsXNtZ9N1vEbLBqx

Machhindra · Answer 72 · Thu Nov 16 2023 00:04:32 GMT+0800 (China Standard Time)

Clickhouse vs Qdrant (SVD) - https://blog.arguflow.ai/posts/clickhouse-vs-vector-database-qdrant

Alexey Milovidov · Answer 73 · Tue Dec 19 2023 01:05:55 GMT+0800 (China Standard Time)

ClickHouse, TimescaleDB, QuestDB, InfluxDB, eXtremeDB:

https://www.mcobject.com/wp-content/uploads/dlm_uploads/2023/08/TSM-Bench-Benchmarking-Time-Series-Database-Systems.pdf

Joeky · Answer 74 · Tue Dec 19 2023 02:59:02 GMT+0800 (China Standard Time)

ClickHouse, TimescaleDB, QuestDB, InfluxDB, eXtremeDB:
https://www.mcobject.com/wp-content/uploads/dlm_uploads/2023/08/TSM-Bench-Benchmarking-Time-Series-Database-Systems.pdf

@alexey-milovidov This benchmark is based on the result from TSM-Bench repository, they used InfluxDB 1.7 to make benchmarks against Clickhouse. There are 3 mainstream versions of InfluxDB:

InfluxDB 1 - The latest version is 1.8, and it is written in Go
InfluxDB 2 - The rewritten version of InfluxDB 1, and it is written in Rust. Though it is open source, the official helm chart is somehow using InfluxDB 1.8 as of now (December 2023)
InfluxDB 3 - This one is closed source as of now (December 2023). They also claimed InfluxDB 3 is 45x faster than InfluxDB 1.8.

Alexey Milovidov · Answer 75 · Sat Apr 27 2024 10:16:51 GMT+0800 (China Standard Time)

https://www.timestored.com/data/time-series-database-benchmarks