TransAnnot predicts protein functions, orthologous relationships and biological pathways for the whole newly sequenced transcriptome. It uses MMseqs2 reciprocal best hit to obtain closest homologs from UniProtKB database (or user defined database) and infer protein function, structure and orthologous groups based on the identified homologs. Prior to functional annotation, it can perform transcriptome sequence assembly using PLASS (Protein-Level ASSembler) to assemble raw sequence reads on protein level upon user request.
Possible inputs are:
- assembled transcriptomes (obtained e.g. using Trinity) or raw transcriptome reads, which will be assembled at protein level using
plass
- metatranscriptomes
- single-organism transcriptomes, in such case it is possible to check for the contamination with
contamination
module, which is based on MMseqs2 taxonomy workflow
assembly
It assembles raw sequencing reads to large genomic fragments (contigs)annotate
It finds homologs for assembled contigs in the custom defined protein seqeunce database (default UniProtKB) using reciprocal-best hits (rbh module) search from MMseqs2 suitecontamination
It checks contaminated contigs using easy-taxonomy module from MMseqs2 suite. This approach uses taxonomy assignments of every contig to identify contaminationcreatedb
It creates a database from the sequence space (obtained fromdownloaddb
module) in a required format for MMseqs2 rbh moduledownloaddb
It downloads the user defined database that serves as a search space for homology detection
In this step, sequence database for homology search will be downloaded. Default database is UniProtKB and can be obtained using a below command:
transannot downloaddb UniProtKB <outDB> <tmp> [options]
To see other options for your choice, please use the below command:
mmseqs databases -h
and use the below command to download the preferred database (ensure the same keyword as given in mmseqs database -h
):
transannot downloaddb <selection> <outDB> <tmp> [options]
annotate -h
provides details on sequence type and databases acceptable for the annotate
module.
Contamination module checks for the contamination in the transcriptomic data. It uses MMseqs2 easy-taxonomy module.
transannot contamination <Input.fasta> <targetDB> <outPath> <tmp> [options]
You can find the report of taxonomy assignments in outPath
folder.
tmp
folder keeps temporary files. By default, all the intermediate output files from different modules will be kept in this folder. To clear tmp
pass --remove-tmp-files
parameter.