frederic-mahe / stampa

Sequence Taxonomic Assignment by Massive Pairwise Alignments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bsub: command not found

frederic-mahe opened this issue · comments

To speed up taxonomic assignment, the STAMPA pipeline described on that repository splits the input dataset in small chunks and spread the computation load using the LSF scheduler (with the bsub command). If you don't have access to a cluster of computers with LSF installed, you can run the analysis linearly (i.e. multithreaded, not parallelized), using the commands below:

# variables
QUERY="representatives.fas"
DATABASE="V4_references.fas"
THREADS=8

# search for best hits
vsearch \
    --usearch_global ${QUERY} \
    --threads ${THREADS} \
    --dbmask none \
    --qmask none \
    --rowlen 0 \
    --notrunclabels \
    --userfields query+id1+target \
    --maxaccepts 0 \
    --maxrejects 32 \
    --top_hits_only \
    --output_no_hits \
    --db ${DATABASE} \
    --id 0.5 \
    --iddef 1 \
    --userout - | sed 's/;size=/_/ ; s/;//' > hits.representatives

# in case of multi-best hit, find the last-common ancestor
python stampa_merge.py $(pwd)

# sort by decreasing abundance
sort -k2,2nr -k1,1d results.representatives > representatives.results