The test result seems TimescaleDB performance not good
JH-D opened this issue · comments
Description
Follow the docs, finished the test. But the test result quite different with the docs, much more slow.
For example, "Overall query rate 0.31 queries/sec".
Is there something wrong?
Environment infos
Operating System: Ubuntu 20.04.2 LTS
TimescaleDB version: 2.2.1 (Run from docker image: timescale/timescaledb-postgis:latest-pg12)
PostgreSQL version: PostgreSQL 12.6 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.2.1_pre1) 10.2.1 20201203, 64-bit
Hardware: 32CPU, 64GB Mem, 2TB Disk * 2(not SSD)
Step 1. Generate data
root:/data2/tsbs/bin# ./tsbs_generate_data --use-case="iot" --seed=123 --scale=4000 --timestamp-start="2016-01-01T00:00:00Z" --timestamp-end="2016-01-04T00:00:00Z" --log-interval="10s" --format="timescaledb" | gzip > /data3/tmp/timescaledb-data.gz
Step 2. Insert data
root:/data2/tsbs/bin# cat /data3/tmp/timescaledb-data.gz | gunzip | /data2/tsbs/bin/tsbs_load_timescaledb \
> --host="" --port=--pass="" \
> --user="postgres" --workers=8 \
> --in-table-partition-tag=true --chunk-time=8h --write-profile= \
> --field-index-count=1 --do-create-db=true --force-text-format=false \
> --do-abort-on-exist=false
time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
1630503773,1120116.12,1.120143E+07,1120116.12,223994.58,2.240000E+06,223994.58
……
1630505823,379801.23,9.286420E+08,450796.99,75998.57,1.857300E+08,90160.18
1630505833,260677.98,9.312487E+08,449878.56,52001.11,1.862500E+08,89975.84
Summary:
loaded 933526165 metrics in 2079.319sec with 8 workers (mean rate 448957.71 metrics/sec)
loaded 186706431 rows in 2079.319sec with 8 workers (mean rate 89792.12 rows/sec)
Step 3. breakdown-frequency Test
#Generate query
/data2/tsbs/bin/tsbs_generate_queries --use-case="iot" --seed=123 --scale=4000 \
--timestamp-start="2016-01-01T00:00:00Z" \
--timestamp-end="2016-01-04T00:00:01Z" \
--queries=1000 --query-type="breakdown-frequency" --format="timescaledb" \
| gzip > /data3/tmp/timescaledb-queries-breakdown-frequency.gz
#Run query test
root@adminuser-PowerEdge-R730xd:/data2/tsbs/bin# cat /data3/tmp/timescaledb-queries-breakdown-frequency.gz | \
> gunzip | /data2/tsbs/bin/tsbs_run_queries_timescaledb --workers=8 \
> --hosts="" --port= --pass=""
After 100 queries with 8 workers:
Interval query rate: 0.30 queries/sec Overall query rate: 0.30 queries/sec
TimescaleDB truck breakdown frequency per model:
min: 21090.30ms, med: 23217.15ms, mean: 26185.40ms, max: 81850.37ms, stddev: 11152.01ms, sum: 2618.5sec, count: 100
all queries :
min: 21090.30ms, med: 23217.15ms, mean: 26185.40ms, max: 81850.37ms, stddev: 11152.01ms, sum: 2618.5sec, count: 100
After 200 queries with 8 workers:
Interval query rate: 0.31 queries/sec Overall query rate: 0.30 queries/sec
TimescaleDB truck breakdown frequency per model:
min: 21090.30ms, med: 22921.22ms, mean: 25807.75ms, max: 81850.37ms, stddev: 10476.78ms, sum: 5161.6sec, count: 200
all queries :
min: 21090.30ms, med: 22921.22ms, mean: 25807.75ms, max: 81850.37ms, stddev: 10476.78ms, sum: 5161.6sec, count: 200
After 300 queries with 8 workers:
Interval query rate: 0.31 queries/sec Overall query rate: 0.31 queries/sec
TimescaleDB truck breakdown frequency per model:
min: 20916.22ms, med: 22945.79ms, mean: 25783.36ms, max: 81850.37ms, stddev: 9573.45ms, sum: 7735.0sec, count: 300
all queries :
min: 20916.22ms, med: 22945.79ms, mean: 25783.36ms, max: 81850.37ms, stddev: 9573.45ms, sum: 7735.0sec, count: 300
After 400 queries with 8 workers:
Interval query rate: 0.31 queries/sec Overall query rate: 0.31 queries/sec
TimescaleDB truck breakdown frequency per model:
min: 20916.22ms, med: 22948.86ms, mean: 25889.35ms, max: 81850.37ms, stddev: 9183.84ms, sum: 10355.7sec, count: 400
all queries :
min: 20916.22ms, med: 22948.86ms, mean: 25889.35ms, max: 81850.37ms, stddev: 9183.84ms, sum: 10355.7sec, count: 400
After 500 queries with 8 workers:
Interval query rate: 0.29 queries/sec Overall query rate: 0.30 queries/sec
TimescaleDB truck breakdown frequency per model:
min: 20916.22ms, med: 23014.40ms, mean: 25976.37ms, max: 81850.37ms, stddev: 8939.94ms, sum: 12988.2sec, count: 500
all queries :
min: 20916.22ms, med: 23014.40ms, mean: 25976.37ms, max: 81850.37ms, stddev: 8939.94ms, sum: 12988.2sec, count: 500
After 600 queries with 8 workers:
Interval query rate: 0.31 queries/sec Overall query rate: 0.31 queries/sec
TimescaleDB truck breakdown frequency per model:
min: 20916.22ms, med: 23053.31ms, mean: 26055.84ms, max: 81850.37ms, stddev: 8789.53ms, sum: 15633.5sec, count: 600
all queries :
min: 20916.22ms, med: 23053.31ms, mean: 26055.84ms, max: 81850.37ms, stddev: 8789.53ms, sum: 15633.5sec, count: 600
After 700 queries with 8 workers:
Interval query rate: 0.31 queries/sec Overall query rate: 0.31 queries/sec
TimescaleDB truck breakdown frequency per model:
min: 20916.22ms, med: 23067.65ms, mean: 26085.63ms, max: 81850.37ms, stddev: 8656.11ms, sum: 18259.9sec, count: 700
all queries :
min: 20916.22ms, med: 23067.65ms, mean: 26085.63ms, max: 81850.37ms, stddev: 8656.11ms, sum: 18259.9sec, count: 700
After 800 queries with 8 workers:
Interval query rate: 0.30 queries/sec Overall query rate: 0.30 queries/sec
TimescaleDB truck breakdown frequency per model:
min: 20916.22ms, med: 23049.22ms, mean: 26045.98ms, max: 81850.37ms, stddev: 8542.09ms, sum: 20836.8sec, count: 800
all queries :
min: 20916.22ms, med: 23049.22ms, mean: 26045.98ms, max: 81850.37ms, stddev: 8542.09ms, sum: 20836.8sec, count: 800
After 900 queries with 8 workers:
Interval query rate: 0.31 queries/sec Overall query rate: 0.31 queries/sec
TimescaleDB truck breakdown frequency per model:
min: 20916.22ms, med: 23035.90ms, mean: 26035.71ms, max: 81850.37ms, stddev: 8432.79ms, sum: 23432.1sec, count: 900
all queries :
min: 20916.22ms, med: 23035.90ms, mean: 26035.71ms, max: 81850.37ms, stddev: 8432.79ms, sum: 23432.1sec, count: 900
After 1000 queries with 8 workers:
Interval query rate: 0.31 queries/sec Overall query rate: 0.31 queries/sec
TimescaleDB truck breakdown frequency per model:
min: 17748.99ms, med: 23073.79ms, mean: 26070.81ms, max: 81850.37ms, stddev: 8391.01ms, sum: 26070.8sec, count: 1000
all queries :
min: 17748.99ms, med: 23073.79ms, mean: 26070.81ms, max: 81850.37ms, stddev: 8391.01ms, sum: 26070.8sec, count: 1000
Run complete after 1000 queries with 8 workers (Overall query rate 0.31 queries/sec):
TimescaleDB truck breakdown frequency per model:
min: 17748.99ms, med: 23073.79ms, mean: 26070.81ms, max: 81850.37ms, stddev: 8391.01ms, sum: 26070.8sec, count: 1000
all queries :
min: 17748.99ms, med: 23073.79ms, mean: 26070.81ms, max: 81850.37ms, stddev: 8391.01ms, sum: 26070.8sec, count: 1000
wall clock time: 3265.990829sec
Step 4. last-loc Test
This has no log info even after waiting hours , have to cancel.
#Generate query
/data2/tsbs/bin/tsbs_generate_queries --use-case="iot" --seed=123 --scale=4000 \
--timestamp-start="2016-01-01T00:00:00Z" \
--timestamp-end="2016-01-04T00:00:01Z" \
--queries=1000 --query-type="last-loc" --format="timescaledb" \
| gzip > /data3/tmp/timescaledb-queries-last-loc.gz
#Run query test
cat /data3/tmp/timescaledb-queries-last-loc.gz | \
gunzip | /data2/tsbs/bin/tsbs_run_queries_timescaledb --workers=8 \
--hosts="" --port= --pass=""
👋 @JH-D ,
Thanks for the detailed step-by-step, which made it really easy to reproduce here and give some feedback. I'll provide a few observations to start with and we can go from there.
Use cases (FYI)
The iot
use case is the newest of the TSBS use cases and has not been maintained for feature updates at this point. There's only so much time and our benchmarks (as well as most others that other companies run with TSBS) is the cpu-only
use case. It's just easier to calculate total metrics and such because of the even numbers.
This also means that some of the queries in the iot
use case are not written efficiently for larger scale
values (the number of "things" to create values for). I think there were plans to do more with the queries, but that just hasn't happened yet.
Specifically, these queries always query ALL data in the table every time. There are no time range filters created for the queries like the cpu-only
/devops
use case. Again, from a benchmarking perspective, this is generally an unrealistic query format.
So, take it for what that is, the blog post you pointed to was to demonstrate a new simulated dataset, but it hasn't been improved in more than 2 years.
scale
size vs. blog post
Your test was different in one significant way, the scale
was set to 4,000 items, whereas the blog post only had 1,000. While that might not seem like a lot, it generates ~130 million more rows (~65 million readings
table/ ~65 million diagnostics
table) .,by jumping from to 4k items. Because the queries have no filter on time
, scanning through 3x more data will just take longer. For instance, I noticed on a different size machine with a large cache that there was still an external merge
operation which will slow it down.
That's one of the big difference to recognize if you're trying to compare against that blog post.
Explaining the sample queries
The two queries you tried happen to be two of the worst given your significant increase in total rows, again because there are no time predicates on any of the sample queries.
The last location
query, for instance, hasn't been updated to take advantage of SkinScan
that was introduced in TimescaleDB 2.3. When I add the appropriate index and run an updated query DISTINCT
query, it returns on 4k items in a few hundred milliseconds. Without SkipScan
, the query as written (because there is no time predicate) will always return poorly (a known PostgreSQL issue). Read more: https://blog.timescale.com/blog/how-we-made-distinct-queries-up-to-8000x-faster-on-postgresql/
The other query creates 10 minute time_buckets
for every item, for all 4 days and then has to rescan the entire table to do a LEAD()
query. It's just a slow query as written without a time predicate as you add more and more tags
(which is equal to scale
).
Possible Docker issue?
I can't say for sure, but I ran your exact process on an EC2 instance that's similar in size (m5.4xlarge). I was not running Docker, however. My ingest rate from a separate client was significantly better than yours - and inline with the numbers we show in that blog post - about ~385,000 rows/sec. Your results for ingest were only ~85,000 rows/sec.
So while your machine is decently big, something appears to be limiting the overall performance of PostgreSQL. Have you tried verifying the resources that Docker is using or running TimecaleDB outside of Docker?
Final thoughts
We do our best to maintain TSBS and update queries, etc. That said, we do know that iot
needs some attention, however our benchmarking standard is cpu-only
and it's the use case that all other databases run as well when they publish benchmarking results. Consider using those queries as a starting place because they attempt to run queries with various time ranges (and a query appropriately written to get the "last value") and see where that puts you.
@ryanbooz
Thanks for reply.
Pre
At the beginning I read the introduction article.
Then conducted an experiment according to GitHub's README.md, so such as the "scale" parameter use 4000, and the sample queries. Because I think the tutorial is basic, should finish it first.
Next
Will try to use the cpu-only
(mainly) and devops
test case.
Will try to install TimescaleDB on ubuntu, not docker, re-run tests.
curious to hear the results of your docker vs non-docker test