When running diamond blastp in multiprocessing mode, some processes hang or segfault non deterministically.
beazerj opened this issue · comments
I'm running diamond blastp in multiprocessing mode on multiple machines (gcloud c2-cpu-standard-60 machine, 60cpus, 260GB memory). Here is the specific command for the blastp search:
diamond blastp -q seqs.faa -d seqs -o out -f 6 qseqid sseqid corrected_bitscore --approx-id 50 --query-cover 90 -k1000 -c1 --more-sensitive -b6 --multiprocessing --tmpdir tmp --parallel_tmp --log.
During the run, I'm observing that some of the processes will either hang or segfault. After recovering with the --mp-recover
option and restarting the alignment process some of these processes will complete (some may still fail). The hang or segfault typically occurs at the "Computing Alignments..." step. Peak RSS is 115GB.
I've run this command on anywhere from 8 to 72 nodes and using multiple levels of sensitivity. It doesn't seem dependent on the number of nodes and i've seen it at every sensitivity level i've tried: fast, default and more-sensitive. I've tried both v2.1.8 and v2.1.9 releases of diamond
May be related to #732 and #747. The issue poster in #732 mentioned that their issue is resolved by downgrading to v2.0.15. If i were to make this downgrade? Would this make a meaningful difference to the quality / speed of the alignment?
Could be some merit to the idea that this issue occurs when trying to align a small number of sequences. Running the diamond depeclust
workflow with the same steps (fast, default, more-sensitive) but on a single machine with greater memory (900GB) such that there are only 4 blocks instead of 12, i don't see the segfault issue except this takes many many days to complete.
I'm having a similar issue. I have over 10000 analyses, so I use Python's for loop to blastp individually.
diamond blastp --more-sensitive -p 40 -q {input_file} -d {dmnd} --evalue 1e-5 -f 6 --out {result} --query-cover cover --subject-cover cover -k 0 --id 40
However, for some reason, the diamond quest stops on a quest and there aren't many sequences within that quest. This error seems to be memory-related, as it only happens when my server runs other tasks (not diamond ones). But in reality, the server has plenty of memory and CPU left over.
My diamond version is v2.1.8.162.