rderelle / Broccoli

orthology assignment using phylogenetic and network analyses

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Same result for "-ratio_ortho = 0.3", "0.5", and "= 0.7". Is this expected?

V-JJ opened this issue · comments

Hello!

We've tried to run broccoli with different ratio_ortho values: 0.5 (default), 0.3 and 0.7.
It turned out that the results were IDENTICAL for all the values. Is this expected? We've doubled checked that our command and jobs were run correctly.

Here you have the command:

# Input data location
proteome_dir=input_dataset_v1
nthr=8
r=0.3

# ML phylogeny method
phylo_method=ml

mkdir -p ML_parameter_r03

python broccoli.py -dir $proteome_dir -phylogenies $phylo_method -threads $nthr -ratio_ortho $r \
        -path_diamond $HOME/Programs/diamond_2.1.4/diamond \
        -path_fasttree $HOME/Programs/FastTree/FastTree

mv dir_* ML_parameter_r03

And here are the stdout files. No errors were detected.

r=0.3

            Broccoli v1.1


 --- STEP 1: kmer clustering

 # parameters
 input dir     : input_dataset_v1
 kmer size     : 100
 kmer nb aa    : 15

 # check input files
 3 input files
 95947 sequences

 # kmer clustering
 3 proteomes on 8 threads
 -> 85022 proteins saved for the next step


 --- STEP 2: phylomes

 # parameters
 e_value     : 0.001
 nb_hits     : 6
 gaps        : 0.7
 phylogenies : maximum likelihood
 threads     : 8

 # check input files
 3 input fasta files
 85022 sequences

 # build phylomes ... be patient
 done


 --- STEP 3: network analysis

 ## parameters
 species overlap  : 0.5
 min edge weight  : 0.1
 min nb hits      : 2
 chimeric edges   : 0.5
 chimeric species : 3
 threads          : 8

 ## get ortho and para
 extract ortho from similarity
 extract ortho from trees
 remove ortho found only once
 extract para from trees
 ## network analysis
 build network:
      _ 68710 nodes
      _ 139250 edges
 load similarity search outputs
 compute lcc for each node
 apply LPA and corrections:
      _ 14984 connected components
      _ 17377 communities
      _ 2 chimeric proteins
      _ 1437 spurious hits removed


 --- STEP 4: orthologous pairs

 ## parameters
 ratio ortho  : 0.3
 not same sp  : False
 threads      : 8

 ## load data
 load NO tree results
 load tree results
 load OGs

 ## analyse 15509 orthologous groups 1 by 1
 done

r=0.5

            Broccoli v1.1


 --- STEP 1: kmer clustering

 # parameters
 input dir     : input_dataset_v1
 kmer size     : 100
 kmer nb aa    : 15

 # check input files
 3 input files
 95947 sequences

 # kmer clustering
 3 proteomes on 8 threads
 -> 85022 proteins saved for the next step


 --- STEP 2: phylomes

 # parameters
 e_value     : 0.001
 nb_hits     : 6
 gaps        : 0.7
 phylogenies : maximum likelihood
 threads     : 8

 # check input files
 3 input fasta files
 85022 sequences

 # build phylomes ... be patient
 done

 --- STEP 3: network analysis

 ## parameters
 species overlap  : 0.5
 min edge weight  : 0.1
 min nb hits      : 2
 chimeric edges   : 0.5
 chimeric species : 3
 threads          : 8

 ## get ortho and para
 extract ortho from similarity
 extract ortho from trees
 remove ortho found only once
 extract para from trees

 ## network analysis
 build network:
      _ 68710 nodes
      _ 139250 edges
 load similarity search outputs
 compute lcc for each node
 apply LPA and corrections:
      _ 14984 connected components
      _ 17377 communities
      _ 2 chimeric proteins
      _ 1437 spurious hits removed


 --- STEP 4: orthologous pairs

 ## parameters
 ratio ortho  : 0.5
 not same sp  : False
 threads      : 8

 ## load data
 load NO tree results
 load tree results
 load OGs

 ## analyse 15509 orthologous groups 1 by 1
 done

r=0.7

            Broccoli v1.1


 --- STEP 1: kmer clustering

 # parameters
 input dir     : input_dataset_v1
 kmer size     : 100
 kmer nb aa    : 15

 # check input files
 3 input files
 95947 sequences

 # kmer clustering
 3 proteomes on 8 threads
 -> 85022 proteins saved for the next step


 --- STEP 2: phylomes

 # parameters
 e_value     : 0.001
 nb_hits     : 6
 gaps        : 0.7
 phylogenies : maximum likelihood
 threads     : 8

 # check input files
 3 input fasta files
 85022 sequences

 # build phylomes ... be patient
 done


 --- STEP 3: network analysis

 ## parameters
 species overlap  : 0.5
 min edge weight  : 0.1
 min nb hits      : 2
 chimeric edges   : 0.5
 chimeric species : 3
 threads          : 8

 ## get ortho and para
 extract ortho from similarity
 extract ortho from trees
 remove ortho found only once
 extract para from trees

## network analysis
 build network:
      _ 68710 nodes
      _ 139250 edges
 load similarity search outputs
 compute lcc for each node
 apply LPA and corrections:
      _ 14984 connected components
      _ 17377 communities
      _ 2 chimeric proteins
      _ 1437 spurious hits removed


 --- STEP 4: orthologous pairs

 ## parameters
 ratio ortho  : 0.7
 not same sp  : False
 threads      : 8

 ## load data
 load NO tree results
 load tree results
 load OGs

 ## analyse 15509 orthologous groups 1 by 1
 done

Thanks in advance,