ggonnella / fastsubtrees

Python library and command line script , for fast extraction of subtrees of fairly large trees, consisting of millions of nodes, such as the NCBI taxonomy tree.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

benchmark runs on one CPU core only?

KonradHoeffner opened this issue · comments

I am currently running the benchmark, which takes a long time, and it only seems to use one CPU core. Is that only the benchmark part or is fastsubtrees in general not parallelized?
Or is it just the SQL part?
At least modern PostgreSQL versions can run a single query on multiple cores, which is an important factor with current CPUs having around 6-16 cores.

In the paper there are two columns, "SQL real time" and "SQL CPU time", with subroot ID 2 having CPU time 5.87s and real time 131.03s, which is a factor of about 22.3, does that mean it runs in parallel on your machine? The paper states an Apple M1 Pro CPU was used, which has 10 CPU Cores, how was this superlinear speedup achived?

fastsubtrees is not parallelized until now

In general it is not easy to trivial to parallelize the tree construction algorithm and fastsubtrees is not parallel.

Regarding the SQL, it depends on the server which is used. I didn't enable or disable parallelisation of the query. Due to the nature of the query, recursive queries are necessary, which are probably again difficult to run in parallel. The server used for the benchmarks is MariaDB.

Regarding the benchmarks, I don't think it is "fair" to parallelize them, i.e. to run multiple benchmarks at once, since the results of a benchmark would be affected too much, by how much the machine is busy with other benchmarks.

The real time is indeed much slower, not faster, than the CPU time and this was constantly so on multiple measurements. I intend to repeat the measurements under Docker on other machines, but it is unfortunately become very common (at least at both institutions to which I have access to the computers) to forbid Docker for "security reasons"...

Oh, sorry for the mixup! I can share measurements of an Intel i9-12900k with 32 GB DDR5 5200 MHz dual channel.
If you want, I can later share results from an i9-10900k as well.

benchmarks.zip

FST

extract             511145              0                   0.08                0.03                0.12                160532
extract             83333               0                   0.08                0.03                0.11                157964
extract             562                 0                   0.08                0.02                0.11                159956
extract             561                 0                   0.08                0.03                0.11                160460
extract             543                 0                   0.09                0.03                0.12                158588
extract             91347               0                   0.08                0.03                0.12                160680
extract             1236                0                   0.10                0.03                0.13                158996
extract             1224                0                   0.12                0.02                0.15                159052
extract             2                   0                   0.18                0.02                0.20                157804
extract             511145              1                   0.08                0.03                0.11                157920
extract             83333               1                   0.08                0.03                0.11                161112
extract             562                 1                   0.08                0.03                0.11                160608
extract             561                 1                   0.08                0.02                0.11                158924
extract             543                 1                   0.10                0.01                0.12                160316
extract             91347               1                   0.09                0.02                0.12                159168
extract             1236                1                   0.11                0.01                0.13                161356
extract             1224                1                   0.12                0.02                0.15                160744
extract             2                   1                   0.17                0.03                0.20                158800
extract             511145              2                   0.08                0.03                0.12                160372
extract             83333               2                   0.07                0.03                0.11                161304
extract             562                 2                   0.08                0.02                0.11                161692
extract             561                 2                   0.09                0.02                0.12                158192
extract             543                 2                   0.08                0.04                0.12                161880
extract             91347               2                   0.09                0.02                0.12                157788
extract             1236                2                   0.10                0.02                0.12                158428
extract             1224                2                   0.12                0.03                0.15                159916
extract             2                   2                   0.17                0.02                0.20                160020

SQL

dbload                                  0                   0.47                0.36                45.95               43176
dbload                                  1                   0.43                0.36                50.72               43232
dbload                                  2                   0.40                0.40                44.24               43084
extract             511145              0                   0.22                0.01                2.18                43652
extract             83333               0                   0.24                0.01                4.84                43636
extract             562                 0                   0.26                0.02                4.90                50768
extract             561                 0                   0.29                0.00                6.60                53040
extract             543                 0                   0.36                0.02                38.53               93028
extract             91347               0                   0.43                0.02                41.75               109796
extract             1236                0                   0.96                0.11                60.59               303448
extract             1224                0                   1.67                0.11                84.27               525944
extract             2                   0                   3.65                0.30                129.21              1172764
extract             511145              1                   0.23                0.00                1.25                44016
extract             83333               1                   0.23                0.01                2.86                43748
extract             562                 1                   0.23                0.02                4.32                50628
extract             561                 1                   0.29                0.01                6.51                53300
extract             543                 1                   0.40                0.03                38.61               93148
extract             91347               1                   0.48                0.03                36.17               109960
extract             1236                1                   1.06                0.07                56.81               303472
extract             1224                1                   1.68                0.10                100.55              525524
extract             2                   1                   3.61                0.30                150.22              1170812
extract             511145              2                   0.23                0.01                1.26                43840
extract             83333               2                   0.22                0.01                2.81                43656
extract             562                 2                   0.25                0.01                4.46                50712
extract             561                 2                   0.27                0.00                6.27                52956
extract             543                 2                   0.36                0.02                36.74               93016
extract             91347               2                   0.40                0.05                39.05               109716
extract             1236                2                   1.13                0.06                58.99               303264
extract             1224                2                   1.88                0.14                93.88               525880
extract             2                   2                   3.71                0.25                147.79              1173160

ATTR

construct-genome_size                                       0                   2.08                0.02                2.11                164420
construct-genome_size                                       1                   1.99                0.03                2.03                161156
construct-genome_size                                       2                   1.98                0.04                2.03                164564
query-genome_size   511145              0                   0.12                0.06                0.18                325096
query-genome_size   83333               0                   0.14                0.04                0.18                325656
query-genome_size   562                 0                   0.12                0.06                0.18                321920
query-genome_size   561                 0                   0.13                0.04                0.18                323544
query-genome_size   543                 0                   0.15                0.06                0.21                324328
query-genome_size   91347               0                   0.15                0.06                0.22                324600
query-genome_size   1236                0                   0.23                0.04                0.27                324740
query-genome_size   1224                0                   0.27                0.06                0.34                325364
query-genome_size   2                   0                   0.48                0.04                0.52                326260
query-genome_size   511145              1                   0.14                0.04                0.18                323780
query-genome_size   83333               1                   0.13                0.05                0.18                322908
query-genome_size   562                 1                   0.13                0.05                0.19                322556
query-genome_size   561                 1                   0.11                0.07                0.18                324328
query-genome_size   543                 1                   0.16                0.06                0.22                324112
query-genome_size   91347               1                   0.15                0.05                0.21                324124
query-genome_size   1236                1                   0.21                0.04                0.26                322116
query-genome_size   1224                1                   0.29                0.04                0.34                327080
query-genome_size   2                   1                   0.46                0.07                0.53                328720
query-genome_size   511145              2                   0.12                0.05                0.17                324076
query-genome_size   83333               2                   0.13                0.05                0.18                323764
query-genome_size   562                 2                   0.13                0.05                0.18                323740
query-genome_size   561                 2                   0.13                0.05                0.18                324028
query-genome_size   543                 2                   0.14                0.06                0.21                324044
query-genome_size   91347               2                   0.13                0.07                0.21                324608
query-genome_size   1236                2                   0.19                0.08                0.27                324072
query-genome_size   1224                2                   0.28                0.06                0.35                324728
query-genome_size   2                   2                   0.45                0.07                0.52                324348
construct-GC_content                    2                   0                   2.02                0.04                2.06                161964
construct-GC_content                    2                   1                   1.98                0.04                2.04                163492
construct-GC_content                    2                   2                   1.99                0.03                2.02                162004
query-GC_content    511145              0                   0.12                0.05                0.18                322740
query-GC_content    83333               0                   0.13                0.04                0.18                323360
query-GC_content    562                 0                   0.11                0.06                0.18                322264
query-GC_content    561                 0                   0.12                0.06                0.18                325288
query-GC_content    543                 0                   0.14                0.06                0.21                323712
query-GC_content    91347               0                   0.15                0.07                0.22                327124
query-GC_content    1236                0                   0.21                0.04                0.26                325556
query-GC_content    1224                0                   0.29                0.05                0.35                325160
query-GC_content    2                   0                   0.47                0.05                0.53                329224
query-GC_content    511145              1                   0.11                0.06                0.17                321300
query-GC_content    83333               1                   0.12                0.06                0.18                325308
query-GC_content    562                 1                   0.12                0.05                0.18                321752
query-GC_content    561                 1                   0.13                0.06                0.19                326024
query-GC_content    543                 1                   0.15                0.05                0.20                323524
query-GC_content    91347               1                   0.15                0.06                0.21                324948
query-GC_content    1236                1                   0.21                0.06                0.27                322828
query-GC_content    1224                1                   0.29                0.05                0.35                327284
query-GC_content    2                   1                   0.44                0.06                0.51                327388
query-GC_content    511145              2                   0.12                0.05                0.18                325348
query-GC_content    83333               2                   0.13                0.04                0.17                322220
query-GC_content    562                 2                   0.13                0.05                0.19                324028
query-GC_content    561                 2                   0.12                0.06                0.18                323216
query-GC_content    543                 2                   0.14                0.05                0.20                325076
query-GC_content    91347               2                   0.16                0.05                0.22                324860
query-GC_content    1236                2                   0.22                0.05                0.27                323448
query-GC_content    1224                2                   0.29                0.04                0.34                324536
query-GC_content    2                   2                   0.46                0.06                0.52                328096

but it is unfortunately become very common (at least at both institutions to which I have access to the computers) to forbid Docker for "security reasons"...

This discussion happened at my institution as well, but as a compromise we may at least use Docker in rootless mode, which works quite well, would this be an option for you or is that forbidden as well? I used the installation script that worked quite well.

In general it is not easy to trivial to parallelize the tree construction algorithm and fastsubtrees is not parallel.

Would it be possible to add a paragraph about this topic to the paper? Either explain why it does not make sense to parallelize it or, if it does make sense but is difficult to implement, add a future work part with pointers on how a parallel algorithm for that problem could look like and how others could build upon your work in the future to implement that.

Thank you for the benchmark results! I see that the real time requirement of the SQL query is much higher also in this case, confirming the results from the MacBook.

I will have a look to the rootless mode. Thank you for the link to it.

Thank you for the benchmark results! I see that the real time requirement of the SQL query is much higher also in this case, confirming the results from the MacBook.

I'm not a database expert but if I interpret this correctly this means that there is a bottleneck somewhere else than the CPU, right? But what could this be? Network shouldn't be used at all if its on a single machine, storage also doesn't look that plausible to me as the Docker image is only 1 GB in size and it doesn't have a volume, maybe MariaDB is limited in RAM?

Snapshot of Docker stats some time during the second benchmark:

CONTAINER ID   NAME           CPU %     MEM USAGE / LIMIT     MEM %     NET I/O         BLOCK I/O         PIDS
58897cad750e   fastsubtrees   96.25%    657.5MiB / 31.14GiB   2.06%     77kB / 54.7kB   5.37GB / 6.96GB   21

Some later time:

CONTAINER ID   NAME           CPU %     MEM USAGE / LIMIT     MEM %     NET I/O          BLOCK I/O      PIDS
58897cad750e   fastsubtrees   91.79%    676.9MiB / 31.14GiB   2.12%     125kB / 89.7kB   9.06GB / 7GB   21

I see a lot of block I/O happening, which means disk writes, doesn't it? Even with a Samsung 980 Pro NVMe .M2 SSD that seems like a lot of writes, which could explain it.
Is it possible to use MariaDB in a memory-only mode?

I didn't use MariaDB yet but could this help? https://mariadb.com/kb/en/memory-storage-engine/

Indeed there is a MEMORY storage engine https://mariadb.com/kb/en/memory-storage-engine/
However just switching the engine would not suffice, since the tables are not stored on disk in this case, thus this would required that the data to be reloaded in the database every time a query is performed (on disk it takes already 40-50 seconds, see above). It might be possible to implement a copy from the normal database to a secondary database using the MEMORY engine and then performing the query on it.

See answer to issue #8

Since the construction of the tree is now much faster, the benchmarks also take a smaller amount of time.
All benchmarks are now done always, including the tree construction benchmarks.

Can confirm, the benchmark is now much faster and runs in 28s now on my PC. Parallelization of the benchmark is thus not necessary anymore.

$ time docker exec fastsubtreesC benchmarks
# NCBI dumps found...
# NCBI taxonomy tree found...
# Running the fastsubtrees tree construction benchmarks...
Step construct, iteration 0...
2022-10-20 12:32:17 INFO: Constructing temporary parents table...
2022-10-20 12:32:17 INFO: Reading data from file "/fastsubtrees/ntdumps/nodes.dmp" ...
2449599it [00:01, 1387186.82it/s]
2022-10-20 12:32:19 INFO: Constructing subtree sizes table...
100%|██████████| 2987600/2987600 [00:05<00:00, 522780.10it/s]
2022-10-20 12:32:24 INFO: Computing depth-first tree traversal order...
100%|██████████| 2987600/2987600 [00:01<00:00, 2762893.08it/s]
2022-10-20 12:32:26 INFO: Finalize index of nodes positions in depth-first traversal...
100%|██████████| 2987600/2987600 [00:00<00:00, 6724502.87it/s]
2022-10-20 12:32:26 SUCCESS: Tree data structure constructed
2022-10-20 12:32:26 INFO: Tree written to file "/fastsubtrees/nt.tree"
Step construct, iteration 1...
2022-10-20 12:32:26 INFO: Constructing temporary parents table...
2022-10-20 12:32:26 INFO: Reading data from file "/fastsubtrees/ntdumps/nodes.dmp" ...
2449599it [00:01, 1383871.49it/s]
2022-10-20 12:32:28 INFO: Constructing subtree sizes table...
100%|██████████| 2987600/2987600 [00:05<00:00, 521251.15it/s]
2022-10-20 12:32:34 INFO: Computing depth-first tree traversal order...
100%|██████████| 2987600/2987600 [00:01<00:00, 2746442.62it/s]
2022-10-20 12:32:35 INFO: Finalize index of nodes positions in depth-first traversal...
100%|██████████| 2987600/2987600 [00:00<00:00, 6779932.97it/s]
2022-10-20 12:32:35 SUCCESS: Tree data structure constructed
2022-10-20 12:32:35 INFO: Tree written to file "/fastsubtrees/nt.tree"
Step construct, iteration 2...
2022-10-20 12:32:35 INFO: Constructing temporary parents table...
2022-10-20 12:32:35 INFO: Reading data from file "/fastsubtrees/ntdumps/nodes.dmp" ...
2449599it [00:01, 1369837.79it/s]
2022-10-20 12:32:37 INFO: Constructing subtree sizes table...
100%|██████████| 2987600/2987600 [00:05<00:00, 518129.23it/s]
2022-10-20 12:32:43 INFO: Computing depth-first tree traversal order...
100%|██████████| 2987600/2987600 [00:01<00:00, 2762374.15it/s]
2022-10-20 12:32:44 INFO: Finalize index of nodes positions in depth-first traversal...
100%|██████████| 2987600/2987600 [00:00<00:00, 6855654.62it/s]
2022-10-20 12:32:45 SUCCESS: Tree data structure constructed
2022-10-20 12:32:45 INFO: Tree written to file "/fastsubtrees/nt.tree"
# Done. The results are in /fastsubtrees/benchmarks_construct.tsv
# To copy out of the container use:
# docker cp 07dbddfca720:/fastsubtrees/benchmarks_construct.tsv /fastsubtrees/benchmarks_construct.tsv

docker exec fastsubtreesC benchmarks  0.03s user 0.02s system 0% cpu 27.947 total