pereiramemo / GeoBlast

Identify geographic location of amplicon sequences based on blast searches

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GeoBlast pipeline

Identify geographic location of 16S rRNA amplicon sequences based on blast searches

The pipeline consists of three modules:

  1. Blast search and blast output parsing
  2. GenBank files download
  3. Extract geographic location from GenBank files

Install

GeoBlast runs with Docker or Singularity.
To install it simply download the one of these two wrap scripts:
Docker: geoblast_runner.sh
Singularity: geoblast_runner.sh

And add the execute permission:

chmod +x geoblast_runner.sh

Usage

Usage: geoblast_runner.bash <input file> <output directory> <options>
--help                          print this help
--min_id NUM                    minimum percentage of identity
--min_perc_len NUM              minimum alignment percentage length
--e_val NUM                     e-value
--nslots NUM                    number of slots (default 2)
--overwrite t|f                 overwrite current directory (default f)
--sample_name CHAR              sample name (default input file name)

Output

/output_dir:
  blout.tsv (raw blast output)
  blout_filt.tsv (filtered blast output)
geoblast_output.tsv (geoblast final output table)
  /<query>:
    acc2download.txt (list of acc hits to be downloaded)
    downloaded.gbk (downloaded gbk files of hits)
    query_blout_filt.tsv (section of blout_filt.tsv corresponding to <query>)
    parsed_gbk.tsv (parsed fields of gbk files)

About

Identify geographic location of amplicon sequences based on blast searches


Languages

Language:Shell 89.0%Language:Python 11.0%