A list of benchmarks
alexey-milovidov opened this issue · comments
Post every link to benchmark here.
ClickHouse vs Elasticsearch: https://my.oschina.net/taogang/blog/4965018
Brown university: https://github.com/crottyan/mgbench
Abandoned, but still quite fresh.
CERN ClickHouse vs InfluxDB: http://cds.cern.ch/record/2667383/
https://fivetran.com/blog/warehouse-benchmark (without ClickHouse)
h2o.ai dataframe like benchmark: https://h2oai.github.io/db-benchmark/
AMPLab benchmark: https://amplab.cs.berkeley.edu/benchmark/
Outdated and abandoned, does not include ClickHouse.
TSBS benchmark: https://altinity.com/blog/clickhouse-for-time-series
(2018 version, does not have read in order optimization)
ClickHouse vs Kudu on TSBS (from Cloudera, in favor of Kudu):
https://blog.cloudera.com/benchmarking-time-series-workloads-on-apache-kudu-using-tsbs/
ClickHouse vs Spark on Wikipedia data:
https://www.percona.com/blog/2017/02/13/clickhouse-new-opensource-columnar-database/
ClickHouse vs. MariaDB ColumnStore on Star Schema Benchmark (TPC-H derivative):
https://www.percona.com/blog/2020/07/27/clickhouse-and-columnstore-in-the-star-schema-benchmark/
ClickHouse vs. Redshift on NYC taxi dataset:
https://altinitydb.medium.com/clickhouse-vs-amazon-redshift-benchmark-e223429f4f95
ClickHouse vs. Redshift (price-performance), no many details:
https://altinity.com/blog/2017/7/3/clickhouse-vs-redshift-2
PostgreSQL vs ClickHouse via clickhouse_fdw in PostgreSQL:
https://www.percona.com/blog/2019/05/09/improving-olap-workload-performance-for-postgresql-with-clickhouse-database/
ClickHouse vs. MariaDB Column Store vs. Spark vs. MySQL on Wikipedia data:
https://www.percona.com/blog/2017/03/17/column-store-database-benchmarks-mariadb-columnstore-vs-clickhouse-vs-apache-spark/
ClickHouse vs. clickhouse_fdw in PostgreSQL:
https://www.percona.com/blog/2019/05/01/benchmark-clickhouse-database-and-clickhousedb_fdw/
Mark Litwintschik's benchmarks: different systems on different hardware at different time:
https://tech.marksblogg.com/benchmarks.html
SparkSql, Presto, Impala, HAWQ, ClickHouse, GreenPlum on TPC-DS inspired benchmark:
https://programmersought.com/article/7939525005/
ClickHouse vs. kdb+ at Deutsche Bank:
https://youtu.be/0lE-a3TIC5M?t=1036
ClickHouse vs. Redshift on NYC taxi dataset:
https://altinitydb.medium.com/clickhouse-vs-amazon-redshift-benchmark-e223429f4f95
Superceeded by https://altinity.com/blog/clickhouse-and-redshift-face-off-again-in-nyc-taxi-rides-benchmark
ClickHouse vs RedShift on fintech use case:
https://altinity.com/blog/clickhouse-vs-redshift-performance-for-fintech-risk-management
ClickHouse vs Druid vs Rockset:
https://altinity.com/blog/clickhouse-nails-cost-efficiency-challenge-against-druid-rockset
ClickHouse vs ScyllaDB on 500B rows dataset:
https://altinity.com/blog/2020/1/1/clickhouse-cost-efficiency-in-action-analyzing-500-billion-rows-on-an-intel-nuc
ClickHouse vs. Redshift on time series data:
http://brandonharris.io/redshift-clickhouse-time-series/
Tensorbase (an experimental prototype) brief benchmark:
https://tensorbase.io/2021/04/20/base_reload.html
Clickhouse vs. InfluxDB vs. Timescale vs. OpenTSDB
https://www.sciencedirect.com/science/article/pii/S1877050919310439/pdf?md5=ecc390cfc9d0be432ca0c218985c94d5&pid=1-s2.0-S1877050919310439-main.pdf
ClickHouse vs TimescaleDB vs InfluxDB vs. QuasarDB:
https://blog.quasardb.net/benchmarking-timeseries-ingress
ClickHouse vs. TimescaleDB:
https://twitter.com/ThisIsFernandez/status/1402369109594148865
ClickHouse, OpenTSDB, Cassandra, MySQL, InfluxDB and TDEngine on time-series scenario:
https://www.taosdata.com/downloads/TDengine_Testing_Report_cn.pdf
TimescaleDB vs. ClickHouse (use case in real product):
https://splitbee.io/blog/new-pricing
ClickHouse vs DuckDB, HyPer and SQLite in sorting:
https://duckdb.org/2021/08/27/external-sorting.html
All systems are slower than ClickHouse and DuckDB.
ClickHouse is faster than DuckDB in multi-column sorting, no difference with single column sorting and slower on external sorting.
TimescaleDB vs ClickHouse by GitLab:
https://gitlab.com/gitlab-org/incubation-engineering/apm/apm/-/issues/
https://www.youtube.com/watch?v=cMdQsxolcqc
QuestDB vs ClickHouse on time series ingestion performance:
https://questdb.io/blog/2021/06/16/high-cardinality-time-series-data-performance/
Timescale vs ClickHouse (from their blog):
https://blog.timescale.com/blog/what-is-clickhouse-how-does-it-compare-to-postgresql-and-timescaledb-and-how-does-it-perform-for-time-series-data/
ClickHouse vs Redshift vs BigQuery vs Athena vs Presto vs Spark:
A benchmark from Hydrolix website:
Note: Hydrolix itself is a breed of ClickHouse.
Update: they compare results on different hardware. When I run ClickHouse on the same Hardware, it get 0.641 seconds (even if using slightly large dataset of 1.3 billion records instead of 1.1 billion).
Timescale vs ClickHouse (independent benchmark):
https://pradeepchhetri.xyz/clickhousevstimescaledb/
Newer ClickHouse vs. Timescale benchmark from GitLab:
https://gitlab.com/gitlab-org/incubation-engineering/apm/apm/-/issues/4
ClickHouse vs MongoDB from GitLab:
https://gitlab.com/gitlab-org/incubation-engineering/apm/apm/-/issues/12
ClickHouse vs ElasticSearch from Alibaba:
https://www.alibabacloud.com/blog/clickhouse-vs--elasticsearch_597898?spm=a2c65.11461447.0.0.507e2f28FHNoxt
ClickHouse vs SingleStore:
https://data-sleek.com/blog/singlestore-vs-clickhouse-benchmarks/
(needs validation)
ClickHouse vs SingleStore on data loading:
https://altinity.com/blog/loading-100b-rows-in-minutes-in-altinity-cloud
I've done several tests on ClickHouse release from v22.2.3.5-stable ~ v21.8.x on several machines(with different architects(ARM64/AMD64)).
Test results can be seen on: https://clickperf.knat.network/
I've also written a post on how I've built Multi-Arch docker images for those versions and the procedure of theses tests, blog post is on: https://nova.moe/performance-comparison-clickhouse/ (Currently only Chinese version).
Benchmarks from Manticore search: https://github.com/db-benchmarks/db-benchmarks
Caveat: the benchmarks are tuned in favor of their system:
- most of the queries are doing full text search or utilizing secondary indices;
- the table schema for ClickHouse made suboptimal (most of the fields as String).
ClickHouse vs OctoSQL, SPyQL, jq, trdsql, spark-sql, DSQ on JSON processing:
https://github.com/dcmoura/spyql/blob/master/notebooks/json_benchmark.ipynb
A benchmark from QuestDB shows obnoxious results:
https://questdb.io/blog/2022/05/26/query-benchmark-questdb-versus-clickhouse-timescale/
Mark Litwintschik's benchmarks: different systems on different hardware at different time: https://tech.marksblogg.com/benchmarks.html
I noticed that in his benchmarks kdb is performing better than Clickhouse on a CPU.
Plus kdb seems to be popular in HFT circles.
Is it expected for kdb to be faster in benchmarks?
Does Clickhouse use less RAM than kdb?
@alexey-milovidov
0.051 0.146 0.047 0.794 kdb+/q & 4 Intel Xeon Phi 7210 CPUs
chip comes with 64 cores, each with 4 threads. (256 threads per CPU x 4 servers = 1024 threads (but it's special intel Phi processors))
0.241 0.826 1.209 1.781 ClickHouse, 3 x c5d.9xlarge cluster
three c5d.9xlarge EC2 instances for this post. They each contain 36 vCPUs, 72 GB of RAM, (36*3 = 108 threads)
Plus ClickHouse benchmark is like 3 years old. (For ClickHouse it's a LOT)
KDB benchmark is 5 years old, but KDB itself much older and mostly stabilized.
@fshabashev comparison on similar hardware shows that ClickHouse is faster than kdb+ (for banking applications): #22398 (comment)
I don't know how fair this is. I would be curious, particularly if Clickhouse after the recent / impending join optimizations is faster at multitable TPCH. This is my biggest pain point now and even the recent parallel_hash feature has been helpful
@alanpaulkwan ClickHouse is bad on TPC-H.
@alexey-milovidov how clickhouse vs doris on TPC-H.
ClickHouse vs. InfluxDB vs. TimescaleDB on time series workloads from the particle physics domain (ingestion and querying):
https://arxiv.org/pdf/2204.09795.pdf "SciTS: A Benchmark for Time-Series Databases in Scientific Experiments and Industrial Internet of Things" (2022) @jalalmostafa
ClickHouse wins in all experiments except for ingestion speed with small batches (< 10k tuples).
ClickHouse vs. InfluxDB vs. TimescaleDB on time series workloads from the particle physics domain (ingestion and querying):
https://arxiv.org/pdf/2204.09795.pdf "SciTS: A Benchmark for Time-Series Databases in Scientific Experiments and Industrial Internet of Things" (2022) @jalalmostafa
ClickHouse wins in all experiments except for ingestion speed with small batches (< 10k tuples).
@rschu1ze Thanks for referencing!
The code of the benchmark is available here:
@jalalmostafa Thank you! Also added here: ClickHouse/ClickBench#20
@yingxuanwangxuan I did not see any benchmark of ClickHouse vs Apache Doris, while I know that Doris is using many algorithms from ClickHouse and it should demonstrate good performance.
We should add Apache Doris to ClickBench:
https://github.com/ClickHouse/ClickBench
@jalalmostafa My friend @valyala said that it makes sense to also add VictoriaMetrics for one of the next researches.
He says that it should fit perfectly.
@jalalmostafa My friend @valyala said that it makes sense to also add VictoriaMetrics for one of the next researches. He says that it should fit perfectly.
Thank you! VictoriaMetrics is an interesting database. We might include it later in our benchmarks!
ClickHouse vs InfluxDB vs MySQL: https://itnext.io/experience-sharing-clickhouse-performance-testing-9b913aa0daff
@alexey-milovidov how clickhouse vs doris on TPC-H.
-
I don't see a target for this database at https://github.com/ClickHouse/ClickBench#systems-included . Is this because Apache Doris itself has tested it at https://doris.apache.org/docs/benchmark ? But it tests on a completely different machine than ClickBench uses. I think there should be a need to open a new issue tracker in ClickBench?
ClickHouse vs InfluxDB vs MySQL: https://itnext.io/experience-sharing-clickhouse-performance-testing-9b913aa0daff
Ops, The author deleted this Medium story.
@linghengqian I want Doris to be included in the ClickBench, it is in the list under the Doris/PALO
name (should rename it to Apache Doris probably).
ClickHouse vs MariaDB Column Store:
https://medium.com/datadenys/mariadb-column-store-installation-and-quick-overview-9911435e4574
ClickHouse vs ElasticSearch:
MASTER’S THESIS 2022 "Evaluating ClickHouse as a Big Data Processing Solution for IoT-Telemetry" - Adrian Göransson, Oskar Wändesjö: https://lup.lub.lu.se/luur/download?func=downloadFile&recordOId=9078076&fileOId=9078077
Comparing performance with the baseline of storing raw files in Minio.
SigNoz (an observability platform based on ClickHouse) vs. ELK and Loki:
ClickHouse vs. Postgres:
https://aiven.io/blog/why-you-should-offload-your-pg-analytical-workloads-to-clickhouse
Great list. Heavily based on this page I'm trying to gather a meta benchmarks page for all databases:
https://www.timestored.com/time-series-data/time-series-database-benchmarks
If you have any feedback, please let me know.
When I started adding red marks for vendor chosen benchmarks almost everything turned red and most them are very closed with no source code or repeatability. Well done on being very open with clickhouse benchmarks.
@ryanhamilton A correction to the article:
Clickbench is a large suite of benchmarks produced by clickhouse themselves. The focus is not time-series but a wide range of queries. They are being very transparent and open, i.e. on some queries they are beaten but the benchmarks only include open source choices.
ClickBench includes both open-source and proprietary DBMS, both self-managed and Cloud. For self-managed proprietary DBMS we include Kinetica and SingleStore. For Cloud proprietary DBMS we include Snowflake, Redshift, Athena, Aurora, SingleStore, Bytehouse.
ClickHouse vs. DuckDB on Parquet files: duckdb/duckdb#6478
ClickHouse vs. DuckDB on sorting in external memory: https://duckdb.org/2021/08/27/external-sorting.html
ClickHouse vs. RedShift on analysing Ethereum blockchain: https://clickhouse.com/blog/redshift-vs-clickhouse-comparison
ClickHouse vs. DuckDB on local data analysis: https://www.vantage.sh/blog/clickhouse-local-vs-duckdb
ClickHouse, ByConity, Doris, and Presto on TPC-DS: https://www.infoq.cn/article/SQCArsXNtZ9N1vEbLBqx
Clickhouse vs Qdrant (SVD) - https://blog.arguflow.ai/posts/clickhouse-vs-vector-database-qdrant
ClickHouse, TimescaleDB, QuestDB, InfluxDB, eXtremeDB:
ClickHouse, TimescaleDB, QuestDB, InfluxDB, eXtremeDB:
https://www.mcobject.com/wp-content/uploads/dlm_uploads/2023/08/TSM-Bench-Benchmarking-Time-Series-Database-Systems.pdf
@alexey-milovidov This benchmark is based on the result from TSM-Bench repository, they used InfluxDB 1.7 to make benchmarks against Clickhouse. There are 3 mainstream versions of InfluxDB:
- InfluxDB 1 - The latest version is 1.8, and it is written in Go
- InfluxDB 2 - The rewritten version of InfluxDB 1, and it is written in Rust. Though it is open source, the official helm chart is somehow using InfluxDB 1.8 as of now (December 2023)
- InfluxDB 3 - This one is closed source as of now (December 2023). They also claimed InfluxDB 3 is 45x faster than InfluxDB 1.8.