jvfe / metamage_latch

A Latch workflow for taxonomic classification, assembly, binning and annotation of short-read metagenomics datasets

Home Page:https://console.latch.bio/explore/67933/info

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

metamage

metamage is a workflow for taxonomic classification, assembly, binning and annotation of short-read host-associated metagenomics datasets.

    graph TD;
        reads[(Short-read paired-end metagenomics data)]-->hostread(Trimming and host read removal)
        hostread-->|Reads| tax(Taxonomic classification with Kaiju)
        hostread-->|Reads| assem(Assembly with MEGAHIT)
        assem-.->|Assembled contigs| metaq(MetaQuast evaluation)
        assem-->|Assembled contigs| func(Functional annotation)
        assem-->|Assembled contigs| binprep(Binning preparation)
        hostread-->|Reads| binprep
        assem-->|Assembled contigs| bin(Binning with MetaBAT2)
        binprep-->|Depth file| bin
Loading

It's composed of:

Read pre-processing and host read removal

  • fastp for read trimming and other general pre-processing 1
  • BowTie2 for mapping to the host genome and extracting unaligned reads 2

Assembly

Functional annotation

  • Macrel for predicting Antimicrobial Peptide (AMP)-like sequences from contigs 4
  • fARGene for identifying Antimicrobial Resistance Genes (ARGs) from contigs 5
  • Gecco for predicting biosynthetic gene clusters (BCGs) from contigs 6
  • Prodigal for protein-coding gene prediction from contigs. 7

Binning

Taxonomic classification of reads

  • Kaiju for taxonomic classification 10
  • KronaTools for visualizing taxonomic classification results

Output tree

  • |metamage
    • |{sample_name}
      • |{sample_name}_bt_idx - Host genome BowTie index
      • |{sample_name}_bt_unaligned - Reads that didn't align to the host genome
      • |fastp_results - Results from trimming with fastp
      • |kaiju
      • |MEGAHIT
      • |MetaQuast - Assembly evaluation report
      • |{sample_name}_assembly_idx - BowTie Index from assembly data
      • |{sample_name}_assembly_sorted.bam - Reads aligned to assembly contigs
      • |METABAT
      • |fargene_results
      • |gecco_results
      • |macrel_results
      • |prodigal_results

Where to get the data?

  • Kaiju indexes can be generated based on a reference database but you can also find some pre-built ones in the sidebar of the Kaiju website.

  • Reference host genomes can be acquired from a variety of databases, for example Ensembl.

Footnotes

  1. Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu; fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 1 September 2018, Pages i884–i890, https://doi.org/10.1093/bioinformatics/bty560

  2. Langmead B, Wilks C., Antonescu V., Charles R. Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics. bty648.

  3. Li, D., Luo, R., Liu, C.M., Leung, C.M., Ting, H.F., Sadakane, K., Yamashita, H. and Lam, T.W., 2016. MEGAHIT v1.0: A Fast and Scalable Metagenome Assembler driven by Advanced Methodologies and Community Practices. Methods.

  4. Santos-Júnior CD, Pan S, Zhao X, Coelho LP. 2020. Macrel: antimicrobial peptide screening in genomes and metagenomes. PeerJ 8:e10555. DOI: 10.7717/peerj.10555

  5. Berglund, F., Österlund, T., Boulund, F., Marathe, N. P., Larsson, D. J., & Kristiansson, E. (2019). Identification and reconstruction of novel antibiotic resistance genes from metagenomes. Microbiome, 7(1), 52.

  6. Accurate de novo identification of biosynthetic gene clusters with GECCO. Laura M Carroll, Martin Larralde, Jonas Simon Fleck, Ruby Ponnudurai, Alessio Milanese, Elisa Cappio Barazzone, Georg Zeller. bioRxiv 2021.05.03.442509; doi:10.1101/2021.05.03.442509

  7. Hyatt, D., Chen, GL., LoCascio, P.F. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). https://doi.org/10.1186/1471-2105-11-119

  8. Twelve years of SAMtools and BCFtools Petr Danecek, James K Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A McCarthy, Robert M Davies, Heng Li GigaScience, Volume 10, Issue 2, February 2021, giab008, https://doi.org/10.1093/gigascience/giab008

  9. Alla Mikheenko, Vladislav Saveliev, Alexey Gurevich, MetaQUAST: evaluation of metagenome assemblies, Bioinformatics (2016) 32 (7): 1088-1090. doi: 10.1093/bioinformatics/btv697

  10. Kang DD, Li F, Kirton E, Thomas A, Egan R, An H, Wang Z. 2019. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7:e7359 https://doi.org/10.7717/peerj.7359

About

A Latch workflow for taxonomic classification, assembly, binning and annotation of short-read metagenomics datasets

https://console.latch.bio/explore/67933/info

License:MIT License


Languages

Language:Python 93.7%Language:Dockerfile 6.3%