rderelle / Broccoli

orthology assignment using phylogenetic and network analyses

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error Diamond in step 2

Giacomoggioli4111 opened this issue · comments

Hello,

I am trying to run your pipeline and encountered an error at the beginning of the second step:

Variable OMP_NUM_THREADS has been set to 26

            Broccoli v1.1


 --- STEP 1: kmer clustering

 # parameters
 input dir     : NR_proteomes
 kmer size     : 100
 kmer nb aa    : 15

 # check input files
 26 input files
 696519 sequences

 # kmer clustering
 26 proteomes on 26 threads
 -> 642030 proteins saved for the next step


 --- STEP 2: phylomes

 # parameters
 e_value     : 0.001
 nb_hits     : 6
 gaps        : 0.7
 phylogenies : neighbor joining
 threads     : 26

 # check input files
 26 input fasta files
 642030 sequences

 # build phylomes ... be patient
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/data/SBCS-MartinDuranLab/03-Giacomo/src/anaconda3/broccoli_env/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/data/SBCS-MartinDuranLab/03-Giacomo/src/anaconda3/broccoli_env/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/data/SBCS-MartinDuranLab/03-Giacomo/src/Broccoli/scripts/broccoli_step2.py", line 238, in process_file
    --compress 1 --more-sensitive -e ' + str(evalue) + ' -o ' + str(index_dir / search_output) + ' --outfmt 6 qseqid sseqid qstart qend sstart cigar 2>&1', shell=True)
  File "/data/SBCS-MartinDuranLab/03-Giacomo/src/anaconda3/broccoli_env/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/data/SBCS-MartinDuranLab/03-Giacomo/src/anaconda3/broccoli_env/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'diamond blastp --quiet --threads 1 --db dir_step2/databases/0.db --max-target-seqs 6 --query dir_step1/15.fas                 --compress 1 --more-sensitive -e 0.001 -o dir_step2/15/15_0.gz --outfmt 6 qseqid sseqid qstart qend sstart cigar 2>&1' returned non-zero exit status 1.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/data/SBCS-MartinDuranLab/03-Giacomo/src/Broccoli/broccoli.py", line 148, in <module>
    broccoli_step2.step2_phylomes(evalue, max_per_species, path_diamond, path_fasttree, trim_thres, phylo_method, nb_threads)
  File "/data/SBCS-MartinDuranLab/03-Giacomo/src/Broccoli/scripts/broccoli_step2.py", line 78, in step2_phylomes
    multithread_process_file(list_files, nb_threads)
  File "/data/SBCS-MartinDuranLab/03-Giacomo/src/Broccoli/scripts/broccoli_step2.py", line 151, in multithread_process_file
    results_2 = tmp_res.get()
  File "/data/SBCS-MartinDuranLab/03-Giacomo/src/anaconda3/broccoli_env/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
subprocess.CalledProcessError: Command 'diamond blastp --quiet --threads 1 --db dir_step2/databases/0.db --max-target-seqs 6 --query dir_step1/15.fas                 --compress 1 --more-sensitive -e 0.001 -o dir_step2/15/15_0.gz --outfmt 6 qseqid sseqid qstart qend sstart cigar 2>&1' returned non-zero exit status 1.

I have tried to run manually diamond blastp --quiet --threads 1 --db dir_step2/databases/0.db --max-target-seqs 6 --query dir_step1/15.fas --compress 1 --more-sensitive -e 0.001 -o dir_step2/15/15_0.gz --outfmt 6 qseqid sseqid qstart qend sstart cigar 2>&1 and got this error:

diamond v0.9.25.126 | by Benjamin Buchfink <buchfink@gmail.com>
Licensed under the GNU GPL <https://www.gnu.org/licenses/gpl.txt>
Check http://github.com/bbuchfink/diamond for updates.

Error: Invalid output field: cigar

So I looked at your script broccoli_step2.py and I noticed that cigar is not defined before the line 238 in which you call cigar within the diamond script. So I tried to check whether the diamond script without cigar would have worked. Without cigar it produced the file 15_0.gz.
I am not an expert of Python, in fact I would like to hear your opinion about removing cigar from the line 238 of broccoli_step2.py as a fix for my error.

Best,

Giacomo

Hi Giacomo,

you are getting this error because you are using Diamond v0.29 ... and you need v0.3 or above.
The reason for this requirement is that the 'cigar' output format, which is used by Broccoli to build alignments, has been introduced in Diamond v0.3.
nb: here, 'cigar' is part of the output fields of Diamond, not a python variable.

It should work with a newer version of Diamond.

best,
Romain

nb2: Broccoli will produce an output file without 'cigar' but it will certainly crash right after as it won't be able to produce alignments from the Diamond output.

Thank you for your reply.
I see, I thought I just needed diamond 0.9.25 or above, I will try with diamond 0.9.35 and then let you know.

All the best,

Giacomo

It worked perfectly! Thanks for your help!

Best,

Giacomo