This tool is design to classify metagenomic sequences (marker genes, genomes and amplicon reads) using a Hierarchical Taxonomic Classifier.
Please check also the wiki for more information.
The stag classifier requires:
- Python 3.7 (or higher)
- HMMER3 (or Infernal)
- Easel (link)
- seqtk
- prodigal (to predict genes in genomes)
- python library:
- numpy
- pandas
- sklearn
- h5py = 2.10.0
If you have conda, you can install all the dependencies in conda_env_stag.yaml
.
See Installation wiki for more info.
git clone https://github.com/zellerlab/stag.git
cd stag
# if environment is needed
conda env create -f conda_env_stag.yaml
python setup.py bdist_wheel
pip install --no-deps --force-reinstall dist/*.whl
Note: in the following examples we assume that the python script stag
is in the system path.
# if environment was installed
conda activate stag
# test the installation
stag test
Given a fasta file (let's say unknown_seq.fasta
), you can find the taxonomy annotation of these
sequences using:
stag classify -d test_db.stagDB -i unknown_seq.fasta
The output is:
sequence taxonomy
geneA d__Bacteria;p__Firmicutes;c__Bacilli;o__Staphylococcales;f__Staphylococcaceae;g__Staphylococcus
geneB d__Bacteria
geneC d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria
You can either create a database (see Create a database), or use one that we already compiled:
Given a fasta file (let's say unknown_genome.fasta
), you can find the taxonomy annotation of this genome with:
stag classify_genome -i unknown_genome.fasta -d gtdb_30.stagDB -o res_dir
The output is saved in the directory res_dir
. Inside you will find the file genome_annotation
with the annotation
in the same format as in the gene classification. More information on the other files can be found here.
To classify multiple genomes, you can use:
stag classify_genome -D all/genomes/dir -d gtdb_30.stagDB -o res_dir
Where all/genomes/dir
is a directory, and all fasta files inside the directory will be classified.
Finally, you can find some databases to classify genomes (gtdb_30.stagDB
in the examples) here.