frcamacho / Find-pBGCs

A pipeline of scripts to extract and analyze phage encoded Biosynthetic Gene Clusters (pBGCs)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Find-pBGCs

A pipeline of scripts to extract and analyze phage encoded Biosynthetic Gene Clusters (pBGCs)

Now published in Current Biology: https://www.cell.com/current-biology/fulltext/S0960-9822(21)00744-2

Depends on

ncbi-genome-download: https://github.com/kblin/ncbi-genome-download
ProphET: https://github.com/jaumlrc/ProphET
AntiSMASH: https://github.com/antismash/antismash
genbank_to_fasta.py: https://github.com/Coaxecva/GenBank-to-FASTA
bioawk
blast

Note that this takes very long to run, 2 days on a 64 core machine with 512gb RAM and results in ~500gb of disc use. Likely weeks, if at all, on a desktop.

WORKFLOW

ncbi-genome-download --parallel 64 --format fasta,gff --assembly-level complete bacteria

Find-pBGCs.sh refseq/bacteria/

About

A pipeline of scripts to extract and analyze phage encoded Biosynthetic Gene Clusters (pBGCs)


Languages

Language:R 90.9%Language:Shell 9.1%