benchmark runs on one CPU core only?
KonradHoeffner opened this issue · comments
I am currently running the benchmark, which takes a long time, and it only seems to use one CPU core. Is that only the benchmark part or is fastsubtrees in general not parallelized?
Or is it just the SQL part?
At least modern PostgreSQL versions can run a single query on multiple cores, which is an important factor with current CPUs having around 6-16 cores.
In the paper there are two columns, "SQL real time" and "SQL CPU time", with subroot ID 2 having CPU time 5.87s and real time 131.03s, which is a factor of about 22.3, does that mean it runs in parallel on your machine? The paper states an Apple M1 Pro CPU was used, which has 10 CPU Cores, how was this superlinear speedup achived?
fastsubtrees is not parallelized until now
In general it is not easy to trivial to parallelize the tree construction algorithm and fastsubtrees is not parallel.
Regarding the SQL, it depends on the server which is used. I didn't enable or disable parallelisation of the query. Due to the nature of the query, recursive queries are necessary, which are probably again difficult to run in parallel. The server used for the benchmarks is MariaDB.
Regarding the benchmarks, I don't think it is "fair" to parallelize them, i.e. to run multiple benchmarks at once, since the results of a benchmark would be affected too much, by how much the machine is busy with other benchmarks.
The real time is indeed much slower, not faster, than the CPU time and this was constantly so on multiple measurements. I intend to repeat the measurements under Docker on other machines, but it is unfortunately become very common (at least at both institutions to which I have access to the computers) to forbid Docker for "security reasons"...
Oh, sorry for the mixup! I can share measurements of an Intel i9-12900k with 32 GB DDR5 5200 MHz dual channel.
If you want, I can later share results from an i9-10900k as well.
FST
extract 511145 0 0.08 0.03 0.12 160532
extract 83333 0 0.08 0.03 0.11 157964
extract 562 0 0.08 0.02 0.11 159956
extract 561 0 0.08 0.03 0.11 160460
extract 543 0 0.09 0.03 0.12 158588
extract 91347 0 0.08 0.03 0.12 160680
extract 1236 0 0.10 0.03 0.13 158996
extract 1224 0 0.12 0.02 0.15 159052
extract 2 0 0.18 0.02 0.20 157804
extract 511145 1 0.08 0.03 0.11 157920
extract 83333 1 0.08 0.03 0.11 161112
extract 562 1 0.08 0.03 0.11 160608
extract 561 1 0.08 0.02 0.11 158924
extract 543 1 0.10 0.01 0.12 160316
extract 91347 1 0.09 0.02 0.12 159168
extract 1236 1 0.11 0.01 0.13 161356
extract 1224 1 0.12 0.02 0.15 160744
extract 2 1 0.17 0.03 0.20 158800
extract 511145 2 0.08 0.03 0.12 160372
extract 83333 2 0.07 0.03 0.11 161304
extract 562 2 0.08 0.02 0.11 161692
extract 561 2 0.09 0.02 0.12 158192
extract 543 2 0.08 0.04 0.12 161880
extract 91347 2 0.09 0.02 0.12 157788
extract 1236 2 0.10 0.02 0.12 158428
extract 1224 2 0.12 0.03 0.15 159916
extract 2 2 0.17 0.02 0.20 160020
SQL
dbload 0 0.47 0.36 45.95 43176
dbload 1 0.43 0.36 50.72 43232
dbload 2 0.40 0.40 44.24 43084
extract 511145 0 0.22 0.01 2.18 43652
extract 83333 0 0.24 0.01 4.84 43636
extract 562 0 0.26 0.02 4.90 50768
extract 561 0 0.29 0.00 6.60 53040
extract 543 0 0.36 0.02 38.53 93028
extract 91347 0 0.43 0.02 41.75 109796
extract 1236 0 0.96 0.11 60.59 303448
extract 1224 0 1.67 0.11 84.27 525944
extract 2 0 3.65 0.30 129.21 1172764
extract 511145 1 0.23 0.00 1.25 44016
extract 83333 1 0.23 0.01 2.86 43748
extract 562 1 0.23 0.02 4.32 50628
extract 561 1 0.29 0.01 6.51 53300
extract 543 1 0.40 0.03 38.61 93148
extract 91347 1 0.48 0.03 36.17 109960
extract 1236 1 1.06 0.07 56.81 303472
extract 1224 1 1.68 0.10 100.55 525524
extract 2 1 3.61 0.30 150.22 1170812
extract 511145 2 0.23 0.01 1.26 43840
extract 83333 2 0.22 0.01 2.81 43656
extract 562 2 0.25 0.01 4.46 50712
extract 561 2 0.27 0.00 6.27 52956
extract 543 2 0.36 0.02 36.74 93016
extract 91347 2 0.40 0.05 39.05 109716
extract 1236 2 1.13 0.06 58.99 303264
extract 1224 2 1.88 0.14 93.88 525880
extract 2 2 3.71 0.25 147.79 1173160
ATTR
construct-genome_size 0 2.08 0.02 2.11 164420
construct-genome_size 1 1.99 0.03 2.03 161156
construct-genome_size 2 1.98 0.04 2.03 164564
query-genome_size 511145 0 0.12 0.06 0.18 325096
query-genome_size 83333 0 0.14 0.04 0.18 325656
query-genome_size 562 0 0.12 0.06 0.18 321920
query-genome_size 561 0 0.13 0.04 0.18 323544
query-genome_size 543 0 0.15 0.06 0.21 324328
query-genome_size 91347 0 0.15 0.06 0.22 324600
query-genome_size 1236 0 0.23 0.04 0.27 324740
query-genome_size 1224 0 0.27 0.06 0.34 325364
query-genome_size 2 0 0.48 0.04 0.52 326260
query-genome_size 511145 1 0.14 0.04 0.18 323780
query-genome_size 83333 1 0.13 0.05 0.18 322908
query-genome_size 562 1 0.13 0.05 0.19 322556
query-genome_size 561 1 0.11 0.07 0.18 324328
query-genome_size 543 1 0.16 0.06 0.22 324112
query-genome_size 91347 1 0.15 0.05 0.21 324124
query-genome_size 1236 1 0.21 0.04 0.26 322116
query-genome_size 1224 1 0.29 0.04 0.34 327080
query-genome_size 2 1 0.46 0.07 0.53 328720
query-genome_size 511145 2 0.12 0.05 0.17 324076
query-genome_size 83333 2 0.13 0.05 0.18 323764
query-genome_size 562 2 0.13 0.05 0.18 323740
query-genome_size 561 2 0.13 0.05 0.18 324028
query-genome_size 543 2 0.14 0.06 0.21 324044
query-genome_size 91347 2 0.13 0.07 0.21 324608
query-genome_size 1236 2 0.19 0.08 0.27 324072
query-genome_size 1224 2 0.28 0.06 0.35 324728
query-genome_size 2 2 0.45 0.07 0.52 324348
construct-GC_content 2 0 2.02 0.04 2.06 161964
construct-GC_content 2 1 1.98 0.04 2.04 163492
construct-GC_content 2 2 1.99 0.03 2.02 162004
query-GC_content 511145 0 0.12 0.05 0.18 322740
query-GC_content 83333 0 0.13 0.04 0.18 323360
query-GC_content 562 0 0.11 0.06 0.18 322264
query-GC_content 561 0 0.12 0.06 0.18 325288
query-GC_content 543 0 0.14 0.06 0.21 323712
query-GC_content 91347 0 0.15 0.07 0.22 327124
query-GC_content 1236 0 0.21 0.04 0.26 325556
query-GC_content 1224 0 0.29 0.05 0.35 325160
query-GC_content 2 0 0.47 0.05 0.53 329224
query-GC_content 511145 1 0.11 0.06 0.17 321300
query-GC_content 83333 1 0.12 0.06 0.18 325308
query-GC_content 562 1 0.12 0.05 0.18 321752
query-GC_content 561 1 0.13 0.06 0.19 326024
query-GC_content 543 1 0.15 0.05 0.20 323524
query-GC_content 91347 1 0.15 0.06 0.21 324948
query-GC_content 1236 1 0.21 0.06 0.27 322828
query-GC_content 1224 1 0.29 0.05 0.35 327284
query-GC_content 2 1 0.44 0.06 0.51 327388
query-GC_content 511145 2 0.12 0.05 0.18 325348
query-GC_content 83333 2 0.13 0.04 0.17 322220
query-GC_content 562 2 0.13 0.05 0.19 324028
query-GC_content 561 2 0.12 0.06 0.18 323216
query-GC_content 543 2 0.14 0.05 0.20 325076
query-GC_content 91347 2 0.16 0.05 0.22 324860
query-GC_content 1236 2 0.22 0.05 0.27 323448
query-GC_content 1224 2 0.29 0.04 0.34 324536
query-GC_content 2 2 0.46 0.06 0.52 328096
but it is unfortunately become very common (at least at both institutions to which I have access to the computers) to forbid Docker for "security reasons"...
This discussion happened at my institution as well, but as a compromise we may at least use Docker in rootless mode, which works quite well, would this be an option for you or is that forbidden as well? I used the installation script that worked quite well.
In general it is not easy to trivial to parallelize the tree construction algorithm and fastsubtrees is not parallel.
Would it be possible to add a paragraph about this topic to the paper? Either explain why it does not make sense to parallelize it or, if it does make sense but is difficult to implement, add a future work part with pointers on how a parallel algorithm for that problem could look like and how others could build upon your work in the future to implement that.
Thank you for the benchmark results! I see that the real time requirement of the SQL query is much higher also in this case, confirming the results from the MacBook.
I will have a look to the rootless mode. Thank you for the link to it.
Thank you for the benchmark results! I see that the real time requirement of the SQL query is much higher also in this case, confirming the results from the MacBook.
I'm not a database expert but if I interpret this correctly this means that there is a bottleneck somewhere else than the CPU, right? But what could this be? Network shouldn't be used at all if its on a single machine, storage also doesn't look that plausible to me as the Docker image is only 1 GB in size and it doesn't have a volume, maybe MariaDB is limited in RAM?
Snapshot of Docker stats some time during the second benchmark:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
58897cad750e fastsubtrees 96.25% 657.5MiB / 31.14GiB 2.06% 77kB / 54.7kB 5.37GB / 6.96GB 21
Some later time:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
58897cad750e fastsubtrees 91.79% 676.9MiB / 31.14GiB 2.12% 125kB / 89.7kB 9.06GB / 7GB 21
I see a lot of block I/O happening, which means disk writes, doesn't it? Even with a Samsung 980 Pro NVMe .M2 SSD that seems like a lot of writes, which could explain it.
Is it possible to use MariaDB in a memory-only mode?
I didn't use MariaDB yet but could this help? https://mariadb.com/kb/en/memory-storage-engine/
Indeed there is a MEMORY storage engine https://mariadb.com/kb/en/memory-storage-engine/
However just switching the engine would not suffice, since the tables are not stored on disk in this case, thus this would required that the data to be reloaded in the database every time a query is performed (on disk it takes already 40-50 seconds, see above). It might be possible to implement a copy from the normal database to a secondary database using the MEMORY engine and then performing the query on it.
See answer to issue #8
Since the construction of the tree is now much faster, the benchmarks also take a smaller amount of time.
All benchmarks are now done always, including the tree construction benchmarks.
Can confirm, the benchmark is now much faster and runs in 28s now on my PC. Parallelization of the benchmark is thus not necessary anymore.
$ time docker exec fastsubtreesC benchmarks
# NCBI dumps found...
# NCBI taxonomy tree found...
# Running the fastsubtrees tree construction benchmarks...
Step construct, iteration 0...
2022-10-20 12:32:17 INFO: Constructing temporary parents table...
2022-10-20 12:32:17 INFO: Reading data from file "/fastsubtrees/ntdumps/nodes.dmp" ...
2449599it [00:01, 1387186.82it/s]
2022-10-20 12:32:19 INFO: Constructing subtree sizes table...
100%|██████████| 2987600/2987600 [00:05<00:00, 522780.10it/s]
2022-10-20 12:32:24 INFO: Computing depth-first tree traversal order...
100%|██████████| 2987600/2987600 [00:01<00:00, 2762893.08it/s]
2022-10-20 12:32:26 INFO: Finalize index of nodes positions in depth-first traversal...
100%|██████████| 2987600/2987600 [00:00<00:00, 6724502.87it/s]
2022-10-20 12:32:26 SUCCESS: Tree data structure constructed
2022-10-20 12:32:26 INFO: Tree written to file "/fastsubtrees/nt.tree"
Step construct, iteration 1...
2022-10-20 12:32:26 INFO: Constructing temporary parents table...
2022-10-20 12:32:26 INFO: Reading data from file "/fastsubtrees/ntdumps/nodes.dmp" ...
2449599it [00:01, 1383871.49it/s]
2022-10-20 12:32:28 INFO: Constructing subtree sizes table...
100%|██████████| 2987600/2987600 [00:05<00:00, 521251.15it/s]
2022-10-20 12:32:34 INFO: Computing depth-first tree traversal order...
100%|██████████| 2987600/2987600 [00:01<00:00, 2746442.62it/s]
2022-10-20 12:32:35 INFO: Finalize index of nodes positions in depth-first traversal...
100%|██████████| 2987600/2987600 [00:00<00:00, 6779932.97it/s]
2022-10-20 12:32:35 SUCCESS: Tree data structure constructed
2022-10-20 12:32:35 INFO: Tree written to file "/fastsubtrees/nt.tree"
Step construct, iteration 2...
2022-10-20 12:32:35 INFO: Constructing temporary parents table...
2022-10-20 12:32:35 INFO: Reading data from file "/fastsubtrees/ntdumps/nodes.dmp" ...
2449599it [00:01, 1369837.79it/s]
2022-10-20 12:32:37 INFO: Constructing subtree sizes table...
100%|██████████| 2987600/2987600 [00:05<00:00, 518129.23it/s]
2022-10-20 12:32:43 INFO: Computing depth-first tree traversal order...
100%|██████████| 2987600/2987600 [00:01<00:00, 2762374.15it/s]
2022-10-20 12:32:44 INFO: Finalize index of nodes positions in depth-first traversal...
100%|██████████| 2987600/2987600 [00:00<00:00, 6855654.62it/s]
2022-10-20 12:32:45 SUCCESS: Tree data structure constructed
2022-10-20 12:32:45 INFO: Tree written to file "/fastsubtrees/nt.tree"
# Done. The results are in /fastsubtrees/benchmarks_construct.tsv
# To copy out of the container use:
# docker cp 07dbddfca720:/fastsubtrees/benchmarks_construct.tsv /fastsubtrees/benchmarks_construct.tsv
docker exec fastsubtreesC benchmarks 0.03s user 0.02s system 0% cpu 27.947 total